Overview

Dataset statistics

Number of variables36
Number of observations78032
Missing cells647063
Missing cells (%)23.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory20.9 MiB
Average record size in memory281.0 B

Variable types

Numeric11
Categorical23
Boolean2

Alerts

Name has a high cardinality: 22710 distinct valuesHigh cardinality
Address has a high cardinality: 6618 distinct valuesHigh cardinality
StreetName has a high cardinality: 669 distinct valuesHigh cardinality
BldgNo has a high cardinality: 94 distinct valuesHigh cardinality
UnitNo has a high cardinality: 3322 distinct valuesHigh cardinality
PostalCode has a high cardinality: 2901 distinct valuesHigh cardinality
Location has a high cardinality: 56 distinct valuesHigh cardinality
NAICSDescr has a high cardinality: 1039 distinct valuesHigh cardinality
Phone has a high cardinality: 25064 distinct valuesHigh cardinality
Fax has a high cardinality: 15752 distinct valuesHigh cardinality
TollFree has a high cardinality: 4117 distinct valuesHigh cardinality
EMail has a high cardinality: 15058 distinct valuesHigh cardinality
WebAddress has a high cardinality: 14200 distinct valuesHigh cardinality
EmplUpdate has a high cardinality: 433 distinct valuesHigh cardinality
Character has a high cardinality: 56 distinct valuesHigh cardinality
CHArea has a high cardinality: 57 distinct valuesHigh cardinality
Modified has a high cardinality: 189 distinct valuesHigh cardinality
X is highly overall correlated with Y and 2 other fieldsHigh correlation
Y is highly overall correlated with X and 2 other fieldsHigh correlation
BusinessID is highly overall correlated with FID and 2 other fieldsHigh correlation
Ward is highly overall correlated with FID and 8 other fieldsHigh correlation
CENT_X is highly overall correlated with Location and 2 other fieldsHigh correlation
CENT_Y is highly overall correlated with Location and 2 other fieldsHigh correlation
Year is highly overall correlated with X and 3 other fieldsHigh correlation
RecordID is highly overall correlated with FID and 2 other fieldsHigh correlation
isnew is highly overall correlated with X and 2 other fieldsHigh correlation
CHArea is highly overall correlated with FID and 6 other fieldsHigh correlation
Character is highly overall correlated with FID and 4 other fieldsHigh correlation
Sector_Des is highly overall correlated with NAICSCatHigh correlation
BIAFulName is highly overall correlated with FID and 3 other fieldsHigh correlation
BIA_NAME is highly overall correlated with FID and 3 other fieldsHigh correlation
Closed is highly overall correlated with BIAFulName and 1 other fieldsHigh correlation
FID is highly overall correlated with BusinessID and 8 other fieldsHigh correlation
BldgNo is highly overall correlated with Location and 2 other fieldsHigh correlation
Location is highly overall correlated with FID and 7 other fieldsHigh correlation
NAICSCode is highly overall correlated with NAICSCatHigh correlation
NAICSCat is highly overall correlated with Location and 5 other fieldsHigh correlation
PIN is highly overall correlated with FID and 3 other fieldsHigh correlation
X has 48605 (62.3%) missing valuesMissing
Y has 48605 (62.3%) missing valuesMissing
Location has 47693 (61.1%) missing valuesMissing
EmplUpdate has 15002 (19.2%) missing valuesMissing
Sector_Des has 63430 (81.3%) missing valuesMissing
CENT_X has 47693 (61.1%) missing valuesMissing
CENT_Y has 47693 (61.1%) missing valuesMissing
PIN has 30339 (38.9%) missing valuesMissing
Character has 61682 (79.0%) missing valuesMissing
CHArea has 46689 (59.8%) missing valuesMissing
Modified has 63217 (81.0%) missing valuesMissing
BIA_NAME has 63207 (81.0%) missing valuesMissing
BIAFulName has 63207 (81.0%) missing valuesMissing
StreetNo is highly skewed (γ1 = 147.6524357)Skewed

Reproduction

Analysis started2023-03-04 22:45:32.820108
Analysis finished2023-03-04 22:46:38.756839
Duration1 minute and 5.94 seconds
Software versionpandas-profiling vv3.5.0
Download configurationconfig.json

Variables

X
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct8684
Distinct (%)29.5%
Missing48605
Missing (%)62.3%
Infinite0
Infinite (%)0.0%
Mean306553.47
Minimum-79.80298
Maximum617060.11
Zeros0
Zeros (%)0.0%
Negative14602
Negative (%)18.7%
Memory size609.8 KiB
2023-03-04T22:46:39.077889image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum-79.80298
5-th percentile-79.716419
Q1-79.64992
median598535.65
Q3608829.52
95-th percentile613567.3
Maximum617060.11
Range617139.91
Interquartile range (IQR)608909.17

Descriptive statistics

Standard deviation304335.28
Coefficient of variation (CV)0.99276409
Kurtosis-1.9996012
Mean306553.47
Median Absolute Deviation (MAD)17202.025
Skewness-0.014922506
Sum9.0209489 × 109
Variance9.261996 × 1010
MonotonicityNot monotonic
2023-03-04T22:46:39.523808image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
609566.1112 201
 
0.3%
-79.64275968 185
 
0.2%
-79.60364656 123
 
0.2%
607701.737 121
 
0.2%
-79.71222857 113
 
0.1%
-79.63864759 107
 
0.1%
604057.4854 101
 
0.1%
609718.3353 100
 
0.1%
-79.56936408 91
 
0.1%
615498.4771 66
 
0.1%
Other values (8674) 28219
36.2%
(Missing) 48605
62.3%
ValueCountFrequency (%)
-79.80298035 1
 
< 0.1%
-79.8014612 1
 
< 0.1%
-79.79447393 1
 
< 0.1%
-79.79439767 1
 
< 0.1%
-79.78884298 1
 
< 0.1%
-79.78871792 20
< 0.1%
-79.78850259 1
 
< 0.1%
-79.78675536 5
 
< 0.1%
-79.78630211 12
< 0.1%
-79.78452433 11
< 0.1%
ValueCountFrequency (%)
617060.1055 1
< 0.1%
616918.4738 1
< 0.1%
616839.6893 1
< 0.1%
616837.5953 1
< 0.1%
616769.3441 1
< 0.1%
616704.5391 1
< 0.1%
616692.2284 1
< 0.1%
616667.6043 1
< 0.1%
616657.8816 1
< 0.1%
616643.3766 1
< 0.1%

Y
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct8684
Distinct (%)29.5%
Missing48605
Missing (%)62.3%
Infinite0
Infinite (%)0.0%
Mean2433290.7
Minimum43.48517
Maximum4843106.9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size609.8 KiB
2023-03-04T22:46:40.045930image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum43.48517
5-th percentile43.53859
Q143.608514
median4818092
Q34829966.3
95-th percentile4838021.6
Maximum4843106.9
Range4843063.4
Interquartile range (IQR)4829922.6

Descriptive statistics

Standard deviation2414921.5
Coefficient of variation (CV)0.99245088
Kurtosis-1.9998953
Mean2433290.7
Median Absolute Deviation (MAD)23561.033
Skewness-0.015148997
Sum7.1604446 × 1010
Variance5.8318459 × 1012
MonotonicityNot monotonic
2023-03-04T22:46:40.579963image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4827535.97 201
 
0.3%
43.59351505 185
 
0.2%
43.67999884 123
 
0.2%
4838234.833 121
 
0.2%
43.55837136 113
 
0.1%
43.72011759 107
 
0.1%
4823601.861 101
 
0.1%
4841653.08 100
 
0.1%
43.5935916 91
 
0.1%
4827677.175 66
 
0.1%
Other values (8674) 28219
36.2%
(Missing) 48605
62.3%
ValueCountFrequency (%)
43.48517014 1
< 0.1%
43.48968489 1
< 0.1%
43.4915708 1
< 0.1%
43.49199992 2
< 0.1%
43.49224252 1
< 0.1%
43.49454092 1
< 0.1%
43.49517064 1
< 0.1%
43.49608236 1
< 0.1%
43.49636475 1
< 0.1%
43.49652992 2
< 0.1%
ValueCountFrequency (%)
4843106.933 3
< 0.1%
4843045.912 1
 
< 0.1%
4842995.781 2
< 0.1%
4842852.901 1
 
< 0.1%
4842722.486 1
 
< 0.1%
4842531.982 2
< 0.1%
4842304.058 2
< 0.1%
4842274.717 1
 
< 0.1%
4842274.399 2
< 0.1%
4842200.556 2
< 0.1%

FID
Real number (ℝ)

Distinct16518
Distinct (%)21.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7823.2043
Minimum1
Maximum16518
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size609.8 KiB
2023-03-04T22:46:41.100429image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile781
Q13902
median7804
Q311705.25
95-th percentile14902
Maximum16518
Range16517
Interquartile range (IQR)7803.25

Descriptive statistics

Standard deviation4538.5029
Coefficient of variation (CV)0.58013351
Kurtosis-1.1665353
Mean7823.2043
Median Absolute Deviation (MAD)3902
Skewness0.024756244
Sum6.1046028 × 108
Variance20598009
MonotonicityNot monotonic
2023-03-04T22:46:41.632020image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 5
 
< 0.1%
9727 5
 
< 0.1%
9729 5
 
< 0.1%
9730 5
 
< 0.1%
9731 5
 
< 0.1%
9732 5
 
< 0.1%
9733 5
 
< 0.1%
9734 5
 
< 0.1%
9735 5
 
< 0.1%
9736 5
 
< 0.1%
Other values (16508) 77982
99.9%
ValueCountFrequency (%)
1 5
< 0.1%
2 5
< 0.1%
3 5
< 0.1%
4 5
< 0.1%
5 5
< 0.1%
6 5
< 0.1%
7 5
< 0.1%
8 5
< 0.1%
9 5
< 0.1%
10 5
< 0.1%
ValueCountFrequency (%)
16518 1
< 0.1%
16517 1
< 0.1%
16516 1
< 0.1%
16515 1
< 0.1%
16514 1
< 0.1%
16513 1
< 0.1%
16512 1
< 0.1%
16511 1
< 0.1%
16510 1
< 0.1%
16509 1
< 0.1%

BusinessID
Real number (ℝ)

Distinct21240
Distinct (%)27.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean34656.267
Minimum2
Maximum94424
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size609.8 KiB
2023-03-04T22:46:42.206228image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile2230
Q19764
median19182.5
Q355026
95-th percentile88915
Maximum94424
Range94422
Interquartile range (IQR)45262

Descriptive statistics

Standard deviation29857.312
Coefficient of variation (CV)0.86152708
Kurtosis-0.99364033
Mean34656.267
Median Absolute Deviation (MAD)16019.5
Skewness0.65057392
Sum2.7042978 × 109
Variance8.9145909 × 108
MonotonicityNot monotonic
2023-03-04T22:46:42.653703image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1055 5
 
< 0.1%
20882 5
 
< 0.1%
19580 5
 
< 0.1%
20871 5
 
< 0.1%
19831 5
 
< 0.1%
19332 5
 
< 0.1%
19583 5
 
< 0.1%
19832 5
 
< 0.1%
19584 5
 
< 0.1%
20872 5
 
< 0.1%
Other values (21230) 77982
99.9%
ValueCountFrequency (%)
2 2
 
< 0.1%
7 5
< 0.1%
10 5
< 0.1%
12 3
< 0.1%
16 5
< 0.1%
18 5
< 0.1%
20 5
< 0.1%
21 5
< 0.1%
23 5
< 0.1%
26 4
< 0.1%
ValueCountFrequency (%)
94424 1
< 0.1%
94423 1
< 0.1%
94419 1
< 0.1%
94371 1
< 0.1%
94321 1
< 0.1%
94319 1
< 0.1%
94318 1
< 0.1%
94317 1
< 0.1%
94313 1
< 0.1%
94293 1
< 0.1%

Name
Categorical

Distinct22710
Distinct (%)29.1%
Missing0
Missing (%)0.0%
Memory size609.8 KiB
Subway
 
212
Tim Hortons
 
181
Petro Canada
 
123
Shoppers Drug Mart
 
102
Tim Horton's
 
97
Other values (22705)
77317 

Length

Max length118
Median length76
Mean length22.654539
Min length1

Characters and Unicode

Total characters1767779
Distinct characters93
Distinct categories15 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5010 ?
Unique (%)6.4%

Sample

1st rowGolf Trends Inc.
2nd rowApex Graphics Inc.
3rd rowSands, John & Associates Limited
4th rowPrintmedia-Tackaberry Times
5th rowS W R Industries Ltd.

Common Values

ValueCountFrequency (%)
Subway 212
 
0.3%
Tim Hortons 181
 
0.2%
Petro Canada 123
 
0.2%
Shoppers Drug Mart 102
 
0.1%
Tim Horton's 97
 
0.1%
PLASP Child Care Centre 96
 
0.1%
Dollarama 92
 
0.1%
Starbucks 88
 
0.1%
Shell Canada 84
 
0.1%
Royal Bank of Canada 78
 
0.1%
Other values (22700) 76879
98.5%

Length

2023-03-04T22:46:42.993875image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
inc 15794
 
5.7%
9127
 
3.3%
ltd 7946
 
2.9%
canada 4795
 
1.7%
centre 2969
 
1.1%
and 2598
 
0.9%
services 2443
 
0.9%
the 2359
 
0.8%
a 2092
 
0.8%
of 2044
 
0.7%
Other values (16113) 225478
81.2%

Most occurring characters

ValueCountFrequency (%)
199927
 
11.3%
e 132589
 
7.5%
a 128136
 
7.2%
n 115216
 
6.5%
i 104250
 
5.9%
r 101893
 
5.8%
o 97613
 
5.5%
t 94807
 
5.4%
s 77470
 
4.4%
l 62777
 
3.6%
Other values (83) 653101
36.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1236769
70.0%
Uppercase Letter 275469
 
15.6%
Space Separator 199927
 
11.3%
Other Punctuation 44368
 
2.5%
Decimal Number 4222
 
0.2%
Dash Punctuation 4194
 
0.2%
Close Punctuation 1272
 
0.1%
Open Punctuation 1266
 
0.1%
Math Symbol 178
 
< 0.1%
Final Punctuation 99
 
< 0.1%
Other values (5) 15
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 132589
10.7%
a 128136
10.4%
n 115216
9.3%
i 104250
 
8.4%
r 101893
 
8.2%
o 97613
 
7.9%
t 94807
 
7.7%
s 77470
 
6.3%
l 62777
 
5.1%
c 60202
 
4.9%
Other values (20) 261816
21.2%
Uppercase Letter
ValueCountFrequency (%)
C 35962
13.1%
S 28667
 
10.4%
I 23883
 
8.7%
M 18395
 
6.7%
L 18128
 
6.6%
A 17083
 
6.2%
P 16975
 
6.2%
T 15559
 
5.6%
D 13515
 
4.9%
B 11145
 
4.0%
Other values (17) 76157
27.6%
Other Punctuation
ValueCountFrequency (%)
. 29521
66.5%
& 7166
 
16.2%
, 3463
 
7.8%
' 3108
 
7.0%
/ 898
 
2.0%
: 88
 
0.2%
# 35
 
0.1%
@ 29
 
0.1%
! 26
 
0.1%
" 16
 
< 0.1%
Other values (2) 18
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
1 906
21.5%
2 760
18.0%
0 712
16.9%
4 418
9.9%
3 334
 
7.9%
9 287
 
6.8%
8 245
 
5.8%
7 197
 
4.7%
5 184
 
4.4%
6 179
 
4.2%
Math Symbol
ValueCountFrequency (%)
+ 152
85.4%
| 25
 
14.0%
> 1
 
0.6%
Close Punctuation
ValueCountFrequency (%)
) 1264
99.4%
] 8
 
0.6%
Space Separator
ValueCountFrequency (%)
199927
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 4194
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1266
100.0%
Final Punctuation
ValueCountFrequency (%)
99
100.0%
Control
ValueCountFrequency (%)
6
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 3
100.0%
Format
ValueCountFrequency (%)
3
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 2
100.0%
Other Symbol
ValueCountFrequency (%)
© 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1512238
85.5%
Common 255541
 
14.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 132589
 
8.8%
a 128136
 
8.5%
n 115216
 
7.6%
i 104250
 
6.9%
r 101893
 
6.7%
o 97613
 
6.5%
t 94807
 
6.3%
s 77470
 
5.1%
l 62777
 
4.2%
c 60202
 
4.0%
Other values (47) 537285
35.5%
Common
ValueCountFrequency (%)
199927
78.2%
. 29521
 
11.6%
& 7166
 
2.8%
- 4194
 
1.6%
, 3463
 
1.4%
' 3108
 
1.2%
( 1266
 
0.5%
) 1264
 
0.5%
1 906
 
0.4%
/ 898
 
0.4%
Other values (26) 3828
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1767601
> 99.9%
Punctuation 102
 
< 0.1%
None 76
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
199927
 
11.3%
e 132589
 
7.5%
a 128136
 
7.2%
n 115216
 
6.5%
i 104250
 
5.9%
r 101893
 
5.8%
o 97613
 
5.5%
t 94807
 
5.4%
s 77470
 
4.4%
l 62777
 
3.6%
Other values (75) 652923
36.9%
Punctuation
ValueCountFrequency (%)
99
97.1%
3
 
2.9%
None
ValueCountFrequency (%)
é 67
88.2%
ü 4
 
5.3%
ē 2
 
2.6%
É 1
 
1.3%
ä 1
 
1.3%
© 1
 
1.3%

Address
Categorical

Distinct6618
Distinct (%)8.5%
Missing0
Missing (%)0.0%
Memory size609.8 KiB
100 City Centre Dr
 
953
5100 Erin Mills Pky
 
523
7205 Goreway Dr
 
483
1250 South Service Rd
 
394
1550 South Gateway Rd
 
284
Other values (6613)
75395 

Length

Max length32
Median length27
Mean length16.625525
Min length5

Characters and Unicode

Total characters1297323
Distinct characters64
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique292 ?
Unique (%)0.4%

Sample

1st row300 Ambassador Dr
2nd row320 Ambassador Dr
3rd row320 Ambassador Dr
4th row320 Ambassador Dr
5th row321 Ambassador Dr

Common Values

ValueCountFrequency (%)
100 City Centre Dr 953
 
1.2%
5100 Erin Mills Pky 523
 
0.7%
7205 Goreway Dr 483
 
0.6%
1250 South Service Rd 394
 
0.5%
1550 South Gateway Rd 284
 
0.4%
4141 Dixie Rd 248
 
0.3%
2225 Erin Mills Pky 238
 
0.3%
50 Burnhamthorpe Rd W 229
 
0.3%
2355 Derry Rd E 212
 
0.3%
2000 Credit Valley Rd 212
 
0.3%
Other values (6608) 74256
95.2%

Length

2023-03-04T22:46:43.330333image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
rd 28597
 
10.8%
dr 17907
 
6.8%
e 12047
 
4.6%
st 9954
 
3.8%
blvd 8013
 
3.0%
w 7245
 
2.7%
dundas 4805
 
1.8%
ave 3977
 
1.5%
matheson 2625
 
1.0%
pky 2579
 
1.0%
Other values (3761) 165836
62.9%

Most occurring characters

ValueCountFrequency (%)
185556
 
14.3%
r 77071
 
5.9%
e 71979
 
5.5%
a 58783
 
4.5%
d 55945
 
4.3%
0 51078
 
3.9%
n 49722
 
3.8%
5 48031
 
3.7%
t 47992
 
3.7%
i 45039
 
3.5%
Other values (54) 606127
46.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 636946
49.1%
Decimal Number 287140
22.1%
Uppercase Letter 187144
 
14.4%
Space Separator 185556
 
14.3%
Dash Punctuation 480
 
< 0.1%
Other Punctuation 54
 
< 0.1%
Modifier Symbol 3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 77071
12.1%
e 71979
11.3%
a 58783
9.2%
d 55945
8.8%
n 49722
 
7.8%
t 47992
 
7.5%
i 45039
 
7.1%
o 36413
 
5.7%
l 32505
 
5.1%
s 27700
 
4.3%
Other values (15) 133797
21.0%
Uppercase Letter
ValueCountFrequency (%)
R 31751
17.0%
D 29023
15.5%
S 18789
10.0%
E 16442
8.8%
B 14485
7.7%
C 13381
7.2%
W 11748
 
6.3%
M 9512
 
5.1%
A 9382
 
5.0%
T 6499
 
3.5%
Other values (14) 26132
14.0%
Decimal Number
ValueCountFrequency (%)
0 51078
17.8%
5 48031
16.7%
1 41652
14.5%
2 31311
10.9%
3 25187
8.8%
6 23265
8.1%
7 20531
7.2%
4 17381
 
6.1%
9 14549
 
5.1%
8 14155
 
4.9%
Other Punctuation
ValueCountFrequency (%)
' 46
85.2%
. 8
 
14.8%
Space Separator
ValueCountFrequency (%)
185556
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 480
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 824090
63.5%
Common 473233
36.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 77071
 
9.4%
e 71979
 
8.7%
a 58783
 
7.1%
d 55945
 
6.8%
n 49722
 
6.0%
t 47992
 
5.8%
i 45039
 
5.5%
o 36413
 
4.4%
l 32505
 
3.9%
R 31751
 
3.9%
Other values (39) 316890
38.5%
Common
ValueCountFrequency (%)
185556
39.2%
0 51078
 
10.8%
5 48031
 
10.1%
1 41652
 
8.8%
2 31311
 
6.6%
3 25187
 
5.3%
6 23265
 
4.9%
7 20531
 
4.3%
4 17381
 
3.7%
9 14549
 
3.1%
Other values (5) 14692
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1297323
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
185556
 
14.3%
r 77071
 
5.9%
e 71979
 
5.5%
a 58783
 
4.5%
d 55945
 
4.3%
0 51078
 
3.9%
n 49722
 
3.8%
5 48031
 
3.7%
t 47992
 
3.7%
i 45039
 
3.5%
Other values (54) 606127
46.7%

StreetNo
Real number (ℝ)

Distinct3090
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2946.1325
Minimum1
Maximum905629
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size609.8 KiB
2023-03-04T22:46:43.633250image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile57
Q11050
median2375
Q35100
95-th percentile7070
Maximum905629
Range905628
Interquartile range (IQR)4050

Descriptive statistics

Standard deviation3997.6662
Coefficient of variation (CV)1.35692
Kurtosis33315.386
Mean2946.1325
Median Absolute Deviation (MAD)1655
Skewness147.65244
Sum2.2989261 × 108
Variance15981335
MonotonicityNot monotonic
2023-03-04T22:46:43.953899image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100 1101
 
1.4%
5100 601
 
0.8%
7205 520
 
0.7%
1250 448
 
0.6%
1 442
 
0.6%
2000 383
 
0.5%
1550 359
 
0.5%
50 313
 
0.4%
4141 310
 
0.4%
2425 304
 
0.4%
Other values (3080) 73251
93.9%
ValueCountFrequency (%)
1 442
0.6%
2 198
0.3%
3 200
0.3%
4 154
 
0.2%
5 7
 
< 0.1%
6 33
 
< 0.1%
7 25
 
< 0.1%
8 21
 
< 0.1%
9 20
 
< 0.1%
10 154
 
0.2%
ValueCountFrequency (%)
905629 1
 
< 0.1%
7895 138
0.2%
7890 7
 
< 0.1%
7885 79
0.1%
7880 6
 
< 0.1%
7875 30
 
< 0.1%
7860 5
 
< 0.1%
7855 5
 
< 0.1%
7850 4
 
< 0.1%
7840 1
 
< 0.1%

StreetName
Categorical

Distinct669
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size609.8 KiB
Dundas St E
 
3202
Matheson Blvd E
 
2125
Dixie Rd
 
1982
Hurontario St
 
1971
Lakeshore Rd E
 
1628
Other values (664)
67124 

Length

Max length26
Median length22
Mean length11.945035
Min length3

Characters and Unicode

Total characters932095
Distinct characters53
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique57 ?
Unique (%)0.1%

Sample

1st rowAmbassador Dr
2nd rowAmbassador Dr
3rd rowAmbassador Dr
4th rowAmbassador Dr
5th rowAmbassador Dr

Common Values

ValueCountFrequency (%)
Dundas St E 3202
 
4.1%
Matheson Blvd E 2125
 
2.7%
Dixie Rd 1982
 
2.5%
Hurontario St 1971
 
2.5%
Lakeshore Rd E 1628
 
2.1%
Dundas St W 1586
 
2.0%
City Centre Dr 1528
 
2.0%
Britannia Rd E 1441
 
1.8%
Tomken Rd 1416
 
1.8%
Argentia Rd 1400
 
1.8%
Other values (659) 59753
76.6%

Length

2023-03-04T22:46:44.280238image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
rd 28598
 
15.4%
dr 17906
 
9.7%
e 12045
 
6.5%
st 9954
 
5.4%
blvd 8011
 
4.3%
w 7247
 
3.9%
dundas 4805
 
2.6%
ave 3978
 
2.1%
matheson 2625
 
1.4%
pky 2575
 
1.4%
Other values (665) 87802
47.3%

Most occurring characters

ValueCountFrequency (%)
107515
 
11.5%
r 77031
 
8.3%
e 71980
 
7.7%
a 58785
 
6.3%
d 55948
 
6.0%
n 49725
 
5.3%
t 47986
 
5.1%
i 45031
 
4.8%
o 36410
 
3.9%
l 32503
 
3.5%
Other values (43) 349181
37.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 636923
68.3%
Uppercase Letter 187126
 
20.1%
Space Separator 107515
 
11.5%
Dash Punctuation 480
 
0.1%
Other Punctuation 51
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 77031
12.1%
e 71980
11.3%
a 58785
9.2%
d 55948
8.8%
n 49725
 
7.8%
t 47986
 
7.5%
i 45031
 
7.1%
o 36410
 
5.7%
l 32503
 
5.1%
s 27702
 
4.3%
Other values (15) 133822
21.0%
Uppercase Letter
ValueCountFrequency (%)
R 31747
17.0%
D 29017
15.5%
S 18788
10.0%
E 16439
8.8%
B 14481
7.7%
C 13374
7.1%
W 11747
 
6.3%
M 9514
 
5.1%
A 9382
 
5.0%
T 6500
 
3.5%
Other values (14) 26137
14.0%
Other Punctuation
ValueCountFrequency (%)
' 45
88.2%
. 6
 
11.8%
Space Separator
ValueCountFrequency (%)
107515
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 480
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 824049
88.4%
Common 108046
 
11.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 77031
 
9.3%
e 71980
 
8.7%
a 58785
 
7.1%
d 55948
 
6.8%
n 49725
 
6.0%
t 47986
 
5.8%
i 45031
 
5.5%
o 36410
 
4.4%
l 32503
 
3.9%
R 31747
 
3.9%
Other values (39) 316903
38.5%
Common
ValueCountFrequency (%)
107515
99.5%
- 480
 
0.4%
' 45
 
< 0.1%
. 6
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 932095
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
107515
 
11.5%
r 77031
 
8.3%
e 71980
 
7.7%
a 58785
 
6.3%
d 55948
 
6.0%
n 49725
 
5.3%
t 47986
 
5.1%
i 45031
 
4.8%
o 36410
 
3.9%
l 32503
 
3.5%
Other values (43) 349181
37.5%

BldgNo
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct94
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size609.8 KiB
73798 
Bldg 2
 
897
Bldg 1
 
858
Bldg A
 
426
Bldg B
 
348
Other values (89)
 
1705

Length

Max length18
Median length1
Mean length1.2798339
Min length1

Characters and Unicode

Total characters99868
Distinct characters53
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique24 ?
Unique (%)< 0.1%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
73798
94.6%
Bldg 2 897
 
1.1%
Bldg 1 858
 
1.1%
Bldg A 426
 
0.5%
Bldg B 348
 
0.4%
Bldg 3 292
 
0.4%
Bldg 4 221
 
0.3%
Bldg K 135
 
0.2%
Bldg C 97
 
0.1%
East Tower 67
 
0.1%
Other values (84) 893
 
1.1%

Length

2023-03-04T22:46:44.569288image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
bldg 3720
44.4%
1 943
 
11.3%
2 941
 
11.2%
a 448
 
5.3%
b 372
 
4.4%
3 321
 
3.8%
4 276
 
3.3%
plaza 169
 
2.0%
k 135
 
1.6%
tower 118
 
1.4%
Other values (58) 931
 
11.1%

Most occurring characters

ValueCountFrequency (%)
77939
78.0%
B 4161
 
4.2%
l 3969
 
4.0%
g 3806
 
3.8%
d 3752
 
3.8%
1 1103
 
1.1%
2 1002
 
1.0%
a 514
 
0.5%
A 454
 
0.5%
3 326
 
0.3%
Other values (43) 2842
 
2.8%

Most occurring categories

ValueCountFrequency (%)
Space Separator 77939
78.0%
Lowercase Letter 13394
 
13.4%
Uppercase Letter 5595
 
5.6%
Decimal Number 2933
 
2.9%
Other Punctuation 5
 
< 0.1%
Dash Punctuation 2
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
B 4161
74.4%
A 454
 
8.1%
P 170
 
3.0%
K 135
 
2.4%
E 119
 
2.1%
T 115
 
2.1%
C 106
 
1.9%
H 83
 
1.5%
D 57
 
1.0%
W 51
 
0.9%
Other values (10) 144
 
2.6%
Lowercase Letter
ValueCountFrequency (%)
l 3969
29.6%
g 3806
28.4%
d 3752
28.0%
a 514
 
3.8%
e 269
 
2.0%
r 225
 
1.7%
z 169
 
1.3%
o 151
 
1.1%
t 149
 
1.1%
s 121
 
0.9%
Other values (10) 269
 
2.0%
Decimal Number
ValueCountFrequency (%)
1 1103
37.6%
2 1002
34.2%
3 326
 
11.1%
4 279
 
9.5%
9 45
 
1.5%
6 43
 
1.5%
5 40
 
1.4%
7 39
 
1.3%
0 33
 
1.1%
8 23
 
0.8%
Space Separator
ValueCountFrequency (%)
77939
100.0%
Other Punctuation
ValueCountFrequency (%)
& 5
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 80879
81.0%
Latin 18989
 
19.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
B 4161
21.9%
l 3969
20.9%
g 3806
20.0%
d 3752
19.8%
a 514
 
2.7%
A 454
 
2.4%
e 269
 
1.4%
r 225
 
1.2%
P 170
 
0.9%
z 169
 
0.9%
Other values (30) 1500
 
7.9%
Common
ValueCountFrequency (%)
77939
96.4%
1 1103
 
1.4%
2 1002
 
1.2%
3 326
 
0.4%
4 279
 
0.3%
9 45
 
0.1%
6 43
 
0.1%
5 40
 
< 0.1%
7 39
 
< 0.1%
0 33
 
< 0.1%
Other values (3) 30
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 99868
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
77939
78.0%
B 4161
 
4.2%
l 3969
 
4.0%
g 3806
 
3.8%
d 3752
 
3.8%
1 1103
 
1.1%
2 1002
 
1.0%
a 514
 
0.5%
A 454
 
0.5%
3 326
 
0.3%
Other values (43) 2842
 
2.8%

UnitNo
Categorical

Distinct3322
Distinct (%)4.3%
Missing0
Missing (%)0.0%
Memory size609.8 KiB
24367 
1
 
2763
2
 
2226
3
 
1941
4
 
1823
Other values (3317)
44912 

Length

Max length39
Median length1
Mean length2.2311488
Min length1

Characters and Unicode

Total characters174101
Distinct characters69
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1140 ?
Unique (%)1.5%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
24367
31.2%
1 2763
 
3.5%
2 2226
 
2.9%
3 1941
 
2.5%
4 1823
 
2.3%
5 1597
 
2.0%
6 1483
 
1.9%
7 1286
 
1.6%
8 1182
 
1.5%
9 993
 
1.3%
Other values (3312) 38371
49.2%

Length

2023-03-04T22:46:44.904784image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1 3419
 
5.4%
to 2757
 
4.4%
2 2631
 
4.2%
3 2352
 
3.7%
4 2199
 
3.5%
5 2002
 
3.2%
6 1816
 
2.9%
7 1704
 
2.7%
8 1577
 
2.5%
9 1315
 
2.1%
Other values (2160) 41469
65.6%

Most occurring characters

ValueCountFrequency (%)
34098
19.6%
1 28398
16.3%
2 18376
10.6%
0 18283
10.5%
3 10149
 
5.8%
4 8314
 
4.8%
5 7050
 
4.0%
6 5941
 
3.4%
7 5008
 
2.9%
8 4658
 
2.7%
Other values (59) 33826
19.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 109874
63.1%
Space Separator 34098
 
19.6%
Lowercase Letter 12491
 
7.2%
Uppercase Letter 10697
 
6.1%
Other Punctuation 4969
 
2.9%
Dash Punctuation 1812
 
1.0%
Close Punctuation 70
 
< 0.1%
Open Punctuation 70
 
< 0.1%
Math Symbol 15
 
< 0.1%
Control 5
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A 2952
27.6%
B 2399
22.4%
C 992
 
9.3%
F 814
 
7.6%
D 563
 
5.3%
E 562
 
5.3%
H 362
 
3.4%
L 333
 
3.1%
G 324
 
3.0%
J 189
 
1.8%
Other values (15) 1207
11.3%
Lowercase Letter
ValueCountFrequency (%)
o 3780
30.3%
t 3301
26.4%
r 843
 
6.7%
l 756
 
6.1%
e 737
 
5.9%
n 464
 
3.7%
a 444
 
3.6%
s 410
 
3.3%
p 278
 
2.2%
d 261
 
2.1%
Other values (13) 1217
 
9.7%
Decimal Number
ValueCountFrequency (%)
1 28398
25.8%
2 18376
16.7%
0 18283
16.6%
3 10149
 
9.2%
4 8314
 
7.6%
5 7050
 
6.4%
6 5941
 
5.4%
7 5008
 
4.6%
8 4658
 
4.2%
9 3697
 
3.4%
Other Punctuation
ValueCountFrequency (%)
& 3862
77.7%
, 1058
 
21.3%
. 28
 
0.6%
/ 20
 
0.4%
1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
34098
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1812
100.0%
Close Punctuation
ValueCountFrequency (%)
) 70
100.0%
Open Punctuation
ValueCountFrequency (%)
( 70
100.0%
Math Symbol
ValueCountFrequency (%)
+ 15
100.0%
Control
ValueCountFrequency (%)
5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 150913
86.7%
Latin 23188
 
13.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 3780
16.3%
t 3301
14.2%
A 2952
12.7%
B 2399
 
10.3%
C 992
 
4.3%
r 843
 
3.6%
F 814
 
3.5%
l 756
 
3.3%
e 737
 
3.2%
D 563
 
2.4%
Other values (38) 6051
26.1%
Common
ValueCountFrequency (%)
34098
22.6%
1 28398
18.8%
2 18376
12.2%
0 18283
12.1%
3 10149
 
6.7%
4 8314
 
5.5%
5 7050
 
4.7%
6 5941
 
3.9%
7 5008
 
3.3%
8 4658
 
3.1%
Other values (11) 10638
 
7.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 174100
> 99.9%
Punctuation 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
34098
19.6%
1 28398
16.3%
2 18376
10.6%
0 18283
10.5%
3 10149
 
5.8%
4 8314
 
4.8%
5 7050
 
4.0%
6 5941
 
3.4%
7 5008
 
2.9%
8 4658
 
2.7%
Other values (58) 33825
19.4%
Punctuation
ValueCountFrequency (%)
1
100.0%

PostalCode
Categorical

Distinct2901
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Memory size609.8 KiB
L5B 2C9
 
768
L5M 4Z5
 
523
L4T 2T9
 
477
L5E 1V4
 
394
L5P 1B2
 
386
Other values (2896)
75484 

Length

Max length33
Median length7
Mean length6.995425
Min length1

Characters and Unicode

Total characters545867
Distinct characters47
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique138 ?
Unique (%)0.2%

Sample

1st rowL5T 2J3
2nd rowL5T 2J3
3rd rowL5T 2J3
4th rowL5T 2J3
5th rowL5T 2J3

Common Values

ValueCountFrequency (%)
L5B 2C9 768
 
1.0%
L5M 4Z5 523
 
0.7%
L4T 2T9 477
 
0.6%
L5E 1V4 394
 
0.5%
L5P 1B2 386
 
0.5%
L5C 1V8 332
 
0.4%
L5J 1K5 296
 
0.4%
L4W 5G6 284
 
0.4%
L4X 1L4 249
 
0.3%
L5B 1M7 247
 
0.3%
Other values (2891) 74076
94.9%

Length

2023-03-04T22:46:45.243532image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
l4w 12403
 
8.0%
l5t 8317
 
5.3%
l5n 6069
 
3.9%
l4z 4948
 
3.2%
l5l 4693
 
3.0%
l5b 4588
 
2.9%
l5s 4258
 
2.7%
l5m 3801
 
2.4%
l4t 3311
 
2.1%
l5a 3290
 
2.1%
Other values (1077) 100200
64.3%

Most occurring characters

ValueCountFrequency (%)
L 86506
15.8%
77968
14.3%
5 63752
11.7%
4 47370
 
8.7%
1 39205
 
7.2%
2 25913
 
4.7%
3 16425
 
3.0%
W 16127
 
3.0%
T 14622
 
2.7%
6 11449
 
2.1%
Other values (37) 146530
26.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 233941
42.9%
Decimal Number 233912
42.9%
Space Separator 77968
 
14.3%
Lowercase Letter 32
 
< 0.1%
Control 14
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
L 86506
37.0%
W 16127
 
6.9%
T 14622
 
6.3%
N 9608
 
4.1%
A 9326
 
4.0%
B 8748
 
3.7%
Z 8458
 
3.6%
M 7909
 
3.4%
C 7879
 
3.4%
V 7750
 
3.3%
Other values (12) 57008
24.4%
Lowercase Letter
ValueCountFrequency (%)
k 9
28.1%
c 5
15.6%
l 5
15.6%
s 3
 
9.4%
d 2
 
6.2%
t 2
 
6.2%
h 1
 
3.1%
i 1
 
3.1%
a 1
 
3.1%
g 1
 
3.1%
Other values (2) 2
 
6.2%
Decimal Number
ValueCountFrequency (%)
5 63752
27.3%
4 47370
20.3%
1 39205
16.8%
2 25913
11.1%
3 16425
 
7.0%
6 11449
 
4.9%
8 9658
 
4.1%
9 8878
 
3.8%
7 8525
 
3.6%
0 2737
 
1.2%
Control
ValueCountFrequency (%)
8
57.1%
6
42.9%
Space Separator
ValueCountFrequency (%)
77968
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 311894
57.1%
Latin 233973
42.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
L 86506
37.0%
W 16127
 
6.9%
T 14622
 
6.2%
N 9608
 
4.1%
A 9326
 
4.0%
B 8748
 
3.7%
Z 8458
 
3.6%
M 7909
 
3.4%
C 7879
 
3.4%
V 7750
 
3.3%
Other values (24) 57040
24.4%
Common
ValueCountFrequency (%)
77968
25.0%
5 63752
20.4%
4 47370
15.2%
1 39205
12.6%
2 25913
 
8.3%
3 16425
 
5.3%
6 11449
 
3.7%
8 9658
 
3.1%
9 8878
 
2.8%
7 8525
 
2.7%
Other values (3) 2751
 
0.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 545867
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
L 86506
15.8%
77968
14.3%
5 63752
11.7%
4 47370
 
8.7%
1 39205
 
7.2%
2 25913
 
4.7%
3 16425
 
3.0%
W 16127
 
3.0%
T 14622
 
2.7%
6 11449
 
2.1%
Other values (37) 146530
26.8%

Location
Categorical

HIGH CARDINALITY
HIGH CORRELATION
MISSING

Distinct56
Distinct (%)0.2%
Missing47693
Missing (%)61.1%
Memory size609.8 KiB
Northeast EA (West)
8087 
Gateway EA (East)
1828 
Dixie EA
1814 
Meadowvale Business Park CC
1734 
Western Business Park EA
1580 
Other values (51)
15296 

Length

Max length27
Median length23
Mean length16.483866
Min length7

Characters and Unicode

Total characters500104
Distinct characters43
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowGateway EA (East)
2nd rowGateway EA (East)
3rd rowGateway EA (East)
4th rowGateway EA (East)
5th rowGateway EA (East)

Common Values

ValueCountFrequency (%)
Northeast EA (West) 8087
 
10.4%
Gateway EA (East) 1828
 
2.3%
Dixie EA 1814
 
2.3%
Meadowvale Business Park CC 1734
 
2.2%
Western Business Park EA 1580
 
2.0%
DT Core 1256
 
1.6%
DT Cooksville 931
 
1.2%
Airport CC 906
 
1.2%
Northeast EA (East) 738
 
0.9%
Mavis-Erindale EA 719
 
0.9%
Other values (46) 10746
 
13.8%
(Missing) 47693
61.1%

Length

2023-03-04T22:46:45.554001image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ea 15721
18.7%
northeast 8825
 
10.5%
west 8730
 
10.4%
nhd 5805
 
6.9%
park 3715
 
4.4%
east 3604
 
4.3%
business 3314
 
3.9%
cc 3101
 
3.7%
gateway 2618
 
3.1%
dt 2576
 
3.1%
Other values (45) 25930
30.9%

Most occurring characters

ValueCountFrequency (%)
53600
 
10.7%
e 44801
 
9.0%
t 42033
 
8.4%
s 38109
 
7.6%
a 32858
 
6.6%
r 25884
 
5.2%
o 23256
 
4.7%
E 21305
 
4.3%
i 18674
 
3.7%
A 17879
 
3.6%
Other values (33) 181705
36.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 300748
60.1%
Uppercase Letter 120982
24.2%
Space Separator 53600
 
10.7%
Open Punctuation 11741
 
2.3%
Close Punctuation 11741
 
2.3%
Dash Punctuation 1292
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 44801
14.9%
t 42033
14.0%
s 38109
12.7%
a 32858
10.9%
r 25884
8.6%
o 23256
7.7%
i 18674
6.2%
l 13559
 
4.5%
n 10785
 
3.6%
h 10586
 
3.5%
Other values (11) 40203
13.4%
Uppercase Letter
ValueCountFrequency (%)
E 21305
17.6%
A 17879
14.8%
N 17487
14.5%
C 14057
11.6%
W 10310
8.5%
D 10195
8.4%
H 6417
 
5.3%
M 5583
 
4.6%
P 4710
 
3.9%
B 3314
 
2.7%
Other values (8) 9725
8.0%
Space Separator
ValueCountFrequency (%)
53600
100.0%
Open Punctuation
ValueCountFrequency (%)
( 11741
100.0%
Close Punctuation
ValueCountFrequency (%)
) 11741
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1292
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 421730
84.3%
Common 78374
 
15.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 44801
 
10.6%
t 42033
 
10.0%
s 38109
 
9.0%
a 32858
 
7.8%
r 25884
 
6.1%
o 23256
 
5.5%
E 21305
 
5.1%
i 18674
 
4.4%
A 17879
 
4.2%
N 17487
 
4.1%
Other values (29) 139444
33.1%
Common
ValueCountFrequency (%)
53600
68.4%
( 11741
 
15.0%
) 11741
 
15.0%
- 1292
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 500104
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
53600
 
10.7%
e 44801
 
9.0%
t 42033
 
8.4%
s 38109
 
7.6%
a 32858
 
6.6%
r 25884
 
5.2%
o 23256
 
4.7%
E 21305
 
4.3%
i 18674
 
3.7%
A 17879
 
3.6%
Other values (33) 181705
36.3%

Ward
Real number (ℝ)

Distinct11
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.3913395
Minimum1
Maximum11
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size609.8 KiB
2023-03-04T22:46:45.830359image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q15
median5
Q37
95-th percentile11
Maximum11
Range10
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.4758594
Coefficient of variation (CV)0.459229
Kurtosis0.01057504
Mean5.3913395
Median Absolute Deviation (MAD)1
Skewness0.34308626
Sum420697
Variance6.12988
MonotonicityNot monotonic
2023-03-04T22:46:46.068158image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
5 33956
43.5%
1 6772
 
8.7%
8 6086
 
7.8%
7 5561
 
7.1%
3 5005
 
6.4%
9 4687
 
6.0%
11 4300
 
5.5%
4 4163
 
5.3%
6 3584
 
4.6%
2 3163
 
4.1%
ValueCountFrequency (%)
1 6772
 
8.7%
2 3163
 
4.1%
3 5005
 
6.4%
4 4163
 
5.3%
5 33956
43.5%
6 3584
 
4.6%
7 5561
 
7.1%
8 6086
 
7.8%
9 4687
 
6.0%
10 755
 
1.0%
ValueCountFrequency (%)
11 4300
 
5.5%
10 755
 
1.0%
9 4687
 
6.0%
8 6086
 
7.8%
7 5561
 
7.1%
6 3584
 
4.6%
5 33956
43.5%
4 4163
 
5.3%
3 5005
 
6.4%
2 3163
 
4.1%

NAICSCode
Real number (ℝ)

Distinct715
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean532884.63
Minimum23829
Maximum913910
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size609.8 KiB
2023-03-04T22:46:46.367262image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum23829
5-th percentile315239
Q1417930
median524210
Q3621330
95-th percentile812116
Maximum913910
Range890081
Interquartile range (IQR)203400

Descriptive statistics

Standard deviation158671.14
Coefficient of variation (CV)0.29775891
Kurtosis-0.65947378
Mean532884.63
Median Absolute Deviation (MAD)97300
Skewness0.31162396
Sum4.1582053 × 1010
Variance2.5176532 × 1010
MonotonicityNot monotonic
2023-03-04T22:46:46.686739image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
722512 3657
 
4.7%
811111 1997
 
2.6%
722511 1782
 
2.3%
621210 1610
 
2.1%
621110 1513
 
1.9%
541110 1384
 
1.8%
812115 1308
 
1.7%
488519 1264
 
1.6%
611110 1242
 
1.6%
813110 1101
 
1.4%
Other values (705) 61174
78.4%
ValueCountFrequency (%)
23829 1
 
< 0.1%
44612 3
< 0.1%
44812 1
 
< 0.1%
54111 4
< 0.1%
111999 1
 
< 0.1%
112999 3
< 0.1%
115110 2
 
< 0.1%
212299 6
< 0.1%
213118 3
< 0.1%
213119 6
< 0.1%
ValueCountFrequency (%)
913910 103
0.1%
913140 101
0.1%
913130 6
 
< 0.1%
912910 37
 
< 0.1%
912210 27
 
< 0.1%
912190 12
 
< 0.1%
912150 3
 
< 0.1%
912130 5
 
< 0.1%
912120 3
 
< 0.1%
912110 1
 
< 0.1%

NAICSCat
Categorical

Distinct33
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size609.8 KiB
Manufacturing
9682 
Other Services
9053 
Retail
8775 
Wholesale
6955 
Professional
5672 
Other values (28)
37895 

Length

Max length50
Median length39
Mean length13.436295
Min length4

Characters and Unicode

Total characters1048461
Distinct characters37
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowWholesale
2nd rowManufacturing
3rd rowManufacturing
4th rowManufacturing
5th rowWholesale

Common Values

ValueCountFrequency (%)
Manufacturing 9682
12.4%
Other Services 9053
11.6%
Retail 8775
11.2%
Wholesale 6955
 
8.9%
Professional 5672
 
7.3%
Health Care 5141
 
6.6%
Accommodation 4936
 
6.3%
Transportation 3046
 
3.9%
Construction 2783
 
3.6%
Educational 2438
 
3.1%
Other values (23) 19551
25.1%

Length

2023-03-04T22:46:47.012565image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
services 12307
 
10.0%
retail 11071
 
9.0%
manufacturing 9682
 
7.9%
other 9053
 
7.4%
wholesale 8749
 
7.1%
and 7484
 
6.1%
professional 7102
 
5.8%
health 6459
 
5.3%
care 6459
 
5.3%
accommodation 6148
 
5.0%
Other values (36) 37992
31.0%

Most occurring characters

ValueCountFrequency (%)
a 109575
 
10.5%
e 105760
 
10.1%
i 79617
 
7.6%
n 77107
 
7.4%
t 76590
 
7.3%
r 66562
 
6.3%
o 64715
 
6.2%
s 54566
 
5.2%
c 52730
 
5.0%
l 50880
 
4.9%
Other values (27) 310359
29.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 886161
84.5%
Uppercase Letter 115659
 
11.0%
Space Separator 44474
 
4.2%
Other Punctuation 2167
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 109575
12.4%
e 105760
11.9%
i 79617
9.0%
n 77107
8.7%
t 76590
8.6%
r 66562
7.5%
o 64715
7.3%
s 54566
 
6.2%
c 52730
 
6.0%
l 50880
 
5.7%
Other values (10) 148059
16.7%
Uppercase Letter
ValueCountFrequency (%)
S 15580
13.5%
R 14001
12.1%
A 12313
10.6%
M 10711
9.3%
W 10015
8.7%
C 9462
8.2%
T 9307
8.0%
O 9053
7.8%
P 7583
6.6%
H 6459
5.6%
Other values (5) 11175
9.7%
Space Separator
ValueCountFrequency (%)
44474
100.0%
Other Punctuation
ValueCountFrequency (%)
, 2167
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1001820
95.6%
Common 46641
 
4.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 109575
10.9%
e 105760
10.6%
i 79617
 
7.9%
n 77107
 
7.7%
t 76590
 
7.6%
r 66562
 
6.6%
o 64715
 
6.5%
s 54566
 
5.4%
c 52730
 
5.3%
l 50880
 
5.1%
Other values (25) 263718
26.3%
Common
ValueCountFrequency (%)
44474
95.4%
, 2167
 
4.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1048461
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 109575
 
10.5%
e 105760
 
10.1%
i 79617
 
7.6%
n 77107
 
7.4%
t 76590
 
7.3%
r 66562
 
6.3%
o 64715
 
6.2%
s 54566
 
5.2%
c 52730
 
5.0%
l 50880
 
4.9%
Other values (27) 310359
29.6%

NAICSDescr
Categorical

Distinct1039
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Memory size609.8 KiB
Limited-service eating places
 
3647
General Automotive Repair
 
1992
Full-service restaurants
 
1777
Offices of Dentists
 
1603
Offices of Physicians
 
1504
Other values (1034)
67509 

Length

Max length175
Median length80
Mean length35.436385
Min length6

Characters and Unicode

Total characters2765172
Distinct characters61
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique124 ?
Unique (%)0.2%

Sample

1st rowAmusement and Sporting Goods Wholesaler-Distributors
2nd rowSupport Activities for Printing
3rd rowSupport Activities for Printing
4th rowOther Printing
5th rowIndustrial Machinery, Equipment and Supplies Wholesaler-Distributors

Common Values

ValueCountFrequency (%)
Limited-service eating places 3647
 
4.7%
General Automotive Repair 1992
 
2.6%
Full-service restaurants 1777
 
2.3%
Offices of Dentists 1603
 
2.1%
Offices of Physicians 1504
 
1.9%
Offices of Lawyers 1376
 
1.8%
Beauty Salons 1302
 
1.7%
Other Freight Transportation Arrangement 1255
 
1.6%
Elementary and Secondary Schools 1240
 
1.6%
Religious Organizations 1098
 
1.4%
Other values (1029) 61238
78.5%

Length

2023-03-04T22:46:47.368732image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
and 33347
 
10.0%
other 18681
 
5.6%
stores 9245
 
2.8%
offices 8694
 
2.6%
of 8405
 
2.5%
services 8315
 
2.5%
all 8273
 
2.5%
wholesaler-distributors 7178
 
2.1%
manufacturing 6730
 
2.0%
supplies 4486
 
1.3%
Other values (1054) 221747
66.2%

Most occurring characters

ValueCountFrequency (%)
e 278627
 
10.1%
258164
 
9.3%
i 198022
 
7.2%
r 189307
 
6.8%
n 183101
 
6.6%
t 181749
 
6.6%
a 181007
 
6.5%
s 160174
 
5.8%
o 139412
 
5.0%
l 115516
 
4.2%
Other values (51) 880093
31.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2193494
79.3%
Uppercase Letter 276079
 
10.0%
Space Separator 258605
 
9.4%
Dash Punctuation 17709
 
0.6%
Other Punctuation 11390
 
0.4%
Open Punctuation 4149
 
0.2%
Close Punctuation 3340
 
0.1%
Control 406
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 278627
12.7%
i 198022
9.0%
r 189307
8.6%
n 183101
 
8.3%
t 181749
 
8.3%
a 181007
 
8.3%
s 160174
 
7.3%
o 139412
 
6.4%
l 115516
 
5.3%
c 105666
 
4.8%
Other values (16) 460913
21.0%
Uppercase Letter
ValueCountFrequency (%)
S 38648
14.0%
O 30856
11.2%
A 24817
 
9.0%
C 24436
 
8.9%
M 21775
 
7.9%
P 18986
 
6.9%
D 14648
 
5.3%
W 12588
 
4.6%
E 11736
 
4.3%
F 11266
 
4.1%
Other values (15) 66323
24.0%
Other Punctuation
ValueCountFrequency (%)
, 9665
84.9%
' 803
 
7.1%
& 488
 
4.3%
. 434
 
3.8%
Space Separator
ValueCountFrequency (%)
258164
99.8%
  441
 
0.2%
Dash Punctuation
ValueCountFrequency (%)
- 17709
100.0%
Open Punctuation
ValueCountFrequency (%)
( 4149
100.0%
Close Punctuation
ValueCountFrequency (%)
) 3340
100.0%
Control
ValueCountFrequency (%)
406
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 2469573
89.3%
Common 295599
 
10.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 278627
 
11.3%
i 198022
 
8.0%
r 189307
 
7.7%
n 183101
 
7.4%
t 181749
 
7.4%
a 181007
 
7.3%
s 160174
 
6.5%
o 139412
 
5.6%
l 115516
 
4.7%
c 105666
 
4.3%
Other values (41) 736992
29.8%
Common
ValueCountFrequency (%)
258164
87.3%
- 17709
 
6.0%
, 9665
 
3.3%
( 4149
 
1.4%
) 3340
 
1.1%
' 803
 
0.3%
& 488
 
0.2%
  441
 
0.1%
. 434
 
0.1%
406
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2764731
> 99.9%
None 441
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 278627
 
10.1%
258164
 
9.3%
i 198022
 
7.2%
r 189307
 
6.8%
n 183101
 
6.6%
t 181749
 
6.6%
a 181007
 
6.5%
s 160174
 
5.8%
o 139412
 
5.0%
l 115516
 
4.2%
Other values (50) 879652
31.8%
None
ValueCountFrequency (%)
  441
100.0%

Phone
Categorical

Distinct25064
Distinct (%)32.1%
Missing0
Missing (%)0.0%
Memory size609.8 KiB
 
1457
905-615-3200
 
40
905-624-3811
 
35
000-000-0000
 
35
905-615-3777
 
24
Other values (25059)
76441 

Length

Max length20
Median length12
Mean length11.66665
Min length1

Characters and Unicode

Total characters910372
Distinct characters21
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7404 ?
Unique (%)9.5%

Sample

1st row905-795-8900
2nd row905-795-9575
3rd row905-795-9519
4th row905-564-8121
5th row905-564-8080

Common Values

ValueCountFrequency (%)
1457
 
1.9%
905-615-3200 40
 
0.1%
905-624-3811 35
 
< 0.1%
000-000-0000 35
 
< 0.1%
905-615-3777 24
 
< 0.1%
905-677-9354 21
 
< 0.1%
905-670-4070 20
 
< 0.1%
905-615-4640 20
 
< 0.1%
905-615-4750 20
 
< 0.1%
905-615-4653 18
 
< 0.1%
Other values (25054) 76342
97.8%

Length

2023-03-04T22:46:47.684267image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
905-615-3200 40
 
0.1%
000-000-0000 35
 
< 0.1%
905-624-3811 35
 
< 0.1%
905-615-3777 24
 
< 0.1%
905-677-9354 21
 
< 0.1%
905-670-4070 20
 
< 0.1%
905-615-4640 20
 
< 0.1%
905-615-4750 20
 
< 0.1%
905-615-4653 18
 
< 0.1%
905-949-2222 17
 
< 0.1%
Other values (25058) 76339
99.7%

Most occurring characters

ValueCountFrequency (%)
- 143126
15.7%
0 136708
15.0%
5 117584
12.9%
9 114775
12.6%
2 71077
7.8%
6 70911
7.8%
7 60427
6.6%
8 60294
6.6%
1 49065
 
5.4%
4 46596
 
5.1%
Other values (11) 39809
 
4.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 765753
84.1%
Dash Punctuation 143130
 
15.7%
Space Separator 1471
 
0.2%
Other Punctuation 9
 
< 0.1%
Lowercase Letter 7
 
< 0.1%
Uppercase Letter 2
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 136708
17.9%
5 117584
15.4%
9 114775
15.0%
2 71077
9.3%
6 70911
9.3%
7 60427
7.9%
8 60294
7.9%
1 49065
 
6.4%
4 46596
 
6.1%
3 38316
 
5.0%
Lowercase Letter
ValueCountFrequency (%)
o 2
28.6%
x 2
28.6%
t 2
28.6%
e 1
14.3%
Dash Punctuation
ValueCountFrequency (%)
- 143126
> 99.9%
4
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
. 6
66.7%
; 3
33.3%
Uppercase Letter
ValueCountFrequency (%)
E 1
50.0%
B 1
50.0%
Space Separator
ValueCountFrequency (%)
1471
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 910363
> 99.9%
Latin 9
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
- 143126
15.7%
0 136708
15.0%
5 117584
12.9%
9 114775
12.6%
2 71077
7.8%
6 70911
7.8%
7 60427
6.6%
8 60294
6.6%
1 49065
 
5.4%
4 46596
 
5.1%
Other values (5) 39800
 
4.4%
Latin
ValueCountFrequency (%)
o 2
22.2%
x 2
22.2%
t 2
22.2%
E 1
11.1%
e 1
11.1%
B 1
11.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 910368
> 99.9%
Punctuation 4
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 143126
15.7%
0 136708
15.0%
5 117584
12.9%
9 114775
12.6%
2 71077
7.8%
6 70911
7.8%
7 60427
6.6%
8 60294
6.6%
1 49065
 
5.4%
4 46596
 
5.1%
Other values (10) 39805
 
4.4%
Punctuation
ValueCountFrequency (%)
4
100.0%

Fax
Categorical

Distinct15752
Distinct (%)20.2%
Missing0
Missing (%)0.0%
Memory size609.8 KiB
29473 
905-822-2673
 
41
905-361-6401
 
37
905-896-9380
 
31
905-502-6982
 
18
Other values (15747)
48432 

Length

Max length14
Median length12
Mean length7.7664163
Min length1

Characters and Unicode

Total characters606029
Distinct characters12
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4752 ?
Unique (%)6.1%

Sample

1st row905-795-8988
2nd row905-795-8775
3rd row905-795-8775
4th row905-564-7395
5th row905-564-5003

Common Values

ValueCountFrequency (%)
29473
37.8%
905-822-2673 41
 
0.1%
905-361-6401 37
 
< 0.1%
905-896-9380 31
 
< 0.1%
905-502-6982 18
 
< 0.1%
905-625-4815 17
 
< 0.1%
905-542-0987 16
 
< 0.1%
905-607-9204 16
 
< 0.1%
905-625-8815 15
 
< 0.1%
905-403-8409 14
 
< 0.1%
Other values (15742) 48354
62.0%

Length

2023-03-04T22:46:47.986090image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
905-822-2673 41
 
0.1%
905-361-6401 37
 
0.1%
905-896-9380 31
 
0.1%
905-502-6982 18
 
< 0.1%
905-625-4815 17
 
< 0.1%
905-542-0987 16
 
< 0.1%
905-607-9204 16
 
< 0.1%
905-625-8815 15
 
< 0.1%
905-403-8409 14
 
< 0.1%
905-625-8245 13
 
< 0.1%
Other values (15742) 48342
99.6%

Most occurring characters

ValueCountFrequency (%)
- 90675
15.0%
0 79738
13.2%
5 78040
12.9%
9 75509
12.5%
6 47327
7.8%
2 44185
7.3%
8 39652
6.5%
7 37892
6.3%
1 30365
 
5.0%
29474
 
4.9%
Other values (2) 53172
8.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 485880
80.2%
Dash Punctuation 90675
 
15.0%
Space Separator 29474
 
4.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 79738
16.4%
5 78040
16.1%
9 75509
15.5%
6 47327
9.7%
2 44185
9.1%
8 39652
8.2%
7 37892
7.8%
1 30365
 
6.2%
4 27785
 
5.7%
3 25387
 
5.2%
Dash Punctuation
ValueCountFrequency (%)
- 90675
100.0%
Space Separator
ValueCountFrequency (%)
29474
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 606029
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
- 90675
15.0%
0 79738
13.2%
5 78040
12.9%
9 75509
12.5%
6 47327
7.8%
2 44185
7.3%
8 39652
6.5%
7 37892
6.3%
1 30365
 
5.0%
29474
 
4.9%
Other values (2) 53172
8.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 606029
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 90675
15.0%
0 79738
13.2%
5 78040
12.9%
9 75509
12.5%
6 47327
7.8%
2 44185
7.3%
8 39652
6.5%
7 37892
6.3%
1 30365
 
5.0%
29474
 
4.9%
Other values (2) 53172
8.8%

TollFree
Categorical

Distinct4117
Distinct (%)5.3%
Missing0
Missing (%)0.0%
Memory size609.8 KiB
66596 
1-800-769-2511
 
32
1-800-465-2422
 
32
1-800-472-6842
 
23
1-877-777-8672
 
16
Other values (4112)
11333 

Length

Max length16
Median length1
Mean length2.8538933
Min length1

Characters and Unicode

Total characters222695
Distinct characters15
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1434 ?
Unique (%)1.8%

Sample

1st row1-800-668-1101
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
66596
85.3%
1-800-769-2511 32
 
< 0.1%
1-800-465-2422 32
 
< 0.1%
1-800-472-6842 23
 
< 0.1%
1-877-777-8672 16
 
< 0.1%
1-877-849-3637 16
 
< 0.1%
1-866-567-8888 13
 
< 0.1%
1-800-668-0414 10
 
< 0.1%
1-800-956-9543 10
 
< 0.1%
1-866-829-9433 10
 
< 0.1%
Other values (4107) 11274
 
14.4%

Length

2023-03-04T22:46:48.493630image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1-800-769-2511 32
 
0.3%
1-800-465-2422 32
 
0.3%
1-800-472-6842 23
 
0.2%
1-877-777-8672 16
 
0.1%
1-877-849-3637 16
 
0.1%
1-866-567-8888 13
 
0.1%
1-877-526-6639 10
 
0.1%
1-800-254-0778 10
 
0.1%
1-800-563-4327 10
 
0.1%
1-866-829-9433 10
 
0.1%
Other values (4111) 11269
98.5%

Most occurring characters

ValueCountFrequency (%)
66601
29.9%
- 31297
14.1%
8 24221
 
10.9%
1 16130
 
7.2%
0 14466
 
6.5%
6 14461
 
6.5%
7 12782
 
5.7%
5 9818
 
4.4%
2 9799
 
4.4%
3 8526
 
3.8%
Other values (5) 14594
 
6.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 124793
56.0%
Space Separator 66601
29.9%
Dash Punctuation 31299
 
14.1%
Lowercase Letter 1
 
< 0.1%
Other Punctuation 1
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
8 24221
19.4%
1 16130
12.9%
0 14466
11.6%
6 14461
11.6%
7 12782
10.2%
5 9818
7.9%
2 9799
7.9%
3 8526
 
6.8%
4 7930
 
6.4%
9 6660
 
5.3%
Dash Punctuation
ValueCountFrequency (%)
- 31297
> 99.9%
2
 
< 0.1%
Space Separator
ValueCountFrequency (%)
66601
100.0%
Lowercase Letter
ValueCountFrequency (%)
x 1
100.0%
Other Punctuation
ValueCountFrequency (%)
. 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 222694
> 99.9%
Latin 1
 
< 0.1%

Most frequent character per script

Common
ValueCountFrequency (%)
66601
29.9%
- 31297
14.1%
8 24221
 
10.9%
1 16130
 
7.2%
0 14466
 
6.5%
6 14461
 
6.5%
7 12782
 
5.7%
5 9818
 
4.4%
2 9799
 
4.4%
3 8526
 
3.8%
Other values (4) 14593
 
6.6%
Latin
ValueCountFrequency (%)
x 1
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 222693
> 99.9%
Punctuation 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
66601
29.9%
- 31297
14.1%
8 24221
 
10.9%
1 16130
 
7.2%
0 14466
 
6.5%
6 14461
 
6.5%
7 12782
 
5.7%
5 9818
 
4.4%
2 9799
 
4.4%
3 8526
 
3.8%
Other values (4) 14592
 
6.6%
Punctuation
ValueCountFrequency (%)
2
100.0%

EMail
Categorical

Distinct15058
Distinct (%)19.3%
Missing0
Missing (%)0.0%
Memory size609.8 KiB
30506 
info@publicstoragecanada.com
 
21
info@taxwide.com
 
20
info@ucmas.ca
 
13
info@mississaugaschoolofmusic.ca
 
13
Other values (15053)
47459 

Length

Max length97
Median length55
Mean length14.085132
Min length1

Characters and Unicode

Total characters1099091
Distinct characters78
Distinct categories11 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3361 ?
Unique (%)4.3%

Sample

1st rowlfinch@golftrendsinc.com
2nd rowprepress@apexgraphics.com
3rd row
4th rowinfo@printmedia.ca
5th rowshsieh@swrltd.com

Common Values

ValueCountFrequency (%)
30506
39.1%
info@publicstoragecanada.com 21
 
< 0.1%
info@taxwide.com 20
 
< 0.1%
info@ucmas.ca 13
 
< 0.1%
info@mississaugaschoolofmusic.ca 13
 
< 0.1%
cyclone@cyclonemfg.com 12
 
< 0.1%
millertrailers@rogers.com 12
 
< 0.1%
info@realfruitbubbletea.com 12
 
< 0.1%
info@akaloptical.com 12
 
< 0.1%
ktc.ca.info@kapsch.net 12
 
< 0.1%
Other values (15048) 47399
60.7%

Length

2023-03-04T22:46:48.833866image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
info@publicstoragecanada.com 21
 
< 0.1%
info@taxwide.com 20
 
< 0.1%
info@ucmas.ca 13
 
< 0.1%
info@mississaugaschoolofmusic.ca 13
 
< 0.1%
cyclone@cyclonemfg.com 12
 
< 0.1%
millertrailers@rogers.com 12
 
< 0.1%
info@realfruitbubbletea.com 12
 
< 0.1%
info@akaloptical.com 12
 
< 0.1%
ktc.ca.info@kapsch.net 12
 
< 0.1%
insure@all-risks.com 11
 
< 0.1%
Other values (15012) 47482
99.7%

Most occurring characters

ValueCountFrequency (%)
o 99086
 
9.0%
a 97080
 
8.8%
c 83214
 
7.6%
i 74076
 
6.7%
e 72811
 
6.6%
n 63754
 
5.8%
m 63062
 
5.7%
s 58432
 
5.3%
r 53466
 
4.9%
. 51798
 
4.7%
Other values (68) 382312
34.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 953466
86.8%
Other Punctuation 99332
 
9.0%
Space Separator 30707
 
2.8%
Decimal Number 11022
 
1.0%
Uppercase Letter 1925
 
0.2%
Dash Punctuation 1864
 
0.2%
Connector Punctuation 766
 
0.1%
Control 4
 
< 0.1%
Modifier Symbol 3
 
< 0.1%
Final Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 99086
10.4%
a 97080
10.2%
c 83214
 
8.7%
i 74076
 
7.8%
e 72811
 
7.6%
n 63754
 
6.7%
m 63062
 
6.6%
s 58432
 
6.1%
r 53466
 
5.6%
t 50375
 
5.3%
Other values (16) 238110
25.0%
Uppercase Letter
ValueCountFrequency (%)
I 281
14.6%
S 211
 
11.0%
M 203
 
10.5%
C 133
 
6.9%
A 122
 
6.3%
D 96
 
5.0%
P 88
 
4.6%
B 81
 
4.2%
J 79
 
4.1%
T 77
 
4.0%
Other values (16) 554
28.8%
Decimal Number
ValueCountFrequency (%)
1 1932
17.5%
0 1824
16.5%
2 1678
15.2%
3 975
8.8%
5 873
7.9%
4 804
7.3%
7 764
 
6.9%
6 755
 
6.8%
8 753
 
6.8%
9 664
 
6.0%
Other Punctuation
ValueCountFrequency (%)
. 51798
52.1%
@ 47451
47.8%
/ 35
 
< 0.1%
& 18
 
< 0.1%
, 8
 
< 0.1%
' 7
 
< 0.1%
# 5
 
< 0.1%
: 5
 
< 0.1%
· 5
 
< 0.1%
Space Separator
ValueCountFrequency (%)
30707
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1864
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 766
100.0%
Control
ValueCountFrequency (%)
4
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 3
100.0%
Final Punctuation
ValueCountFrequency (%)
1
100.0%
Math Symbol
ValueCountFrequency (%)
+ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 955391
86.9%
Common 143700
 
13.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 99086
10.4%
a 97080
10.2%
c 83214
 
8.7%
i 74076
 
7.8%
e 72811
 
7.6%
n 63754
 
6.7%
m 63062
 
6.6%
s 58432
 
6.1%
r 53466
 
5.6%
t 50375
 
5.3%
Other values (42) 240035
25.1%
Common
ValueCountFrequency (%)
. 51798
36.0%
@ 47451
33.0%
30707
21.4%
1 1932
 
1.3%
- 1864
 
1.3%
0 1824
 
1.3%
2 1678
 
1.2%
3 975
 
0.7%
5 873
 
0.6%
4 804
 
0.6%
Other values (16) 3794
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1099085
> 99.9%
None 5
 
< 0.1%
Punctuation 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 99086
 
9.0%
a 97080
 
8.8%
c 83214
 
7.6%
i 74076
 
6.7%
e 72811
 
6.6%
n 63754
 
5.8%
m 63062
 
5.7%
s 58432
 
5.3%
r 53466
 
4.9%
. 51798
 
4.7%
Other values (66) 382306
34.8%
None
ValueCountFrequency (%)
· 5
100.0%
Punctuation
ValueCountFrequency (%)
1
100.0%

WebAddress
Categorical

Distinct14200
Distinct (%)18.2%
Missing0
Missing (%)0.0%
Memory size609.8 KiB
21267 
www.dpcdsb.org
 
221
www.subway.com
 
215
www.timhortons.com
 
211
www.petro-canada.ca
 
115
Other values (14195)
56003 

Length

Max length84
Median length50
Mean length14.525797
Min length1

Characters and Unicode

Total characters1133477
Distinct characters80
Distinct categories12 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2033 ?
Unique (%)2.6%

Sample

1st rowwww.golftrendsinc.com
2nd rowwww.apexgraphics.com
3rd row
4th rowwww.printmedia.ca
5th rowwww.swrltd.com

Common Values

ValueCountFrequency (%)
21267
 
27.3%
www.dpcdsb.org 221
 
0.3%
www.subway.com 215
 
0.3%
www.timhortons.com 211
 
0.3%
www.petro-canada.ca 115
 
0.1%
www.shoppersdrugmart.ca 107
 
0.1%
www.mississauga.ca/portal/residents/fire 95
 
0.1%
www.td.com 91
 
0.1%
www.dollarama.com 88
 
0.1%
www.shell.ca 84
 
0.1%
Other values (14190) 55538
71.2%

Length

2023-03-04T22:46:49.467258image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
www.dpcdsb.org 221
 
0.4%
www.subway.com 215
 
0.4%
www.timhortons.com 211
 
0.4%
www.petro-canada.ca 115
 
0.2%
www.shoppersdrugmart.ca 107
 
0.2%
www.mississauga.ca/portal/residents/fire 95
 
0.2%
www.td.com 91
 
0.2%
www.dollarama.com 88
 
0.2%
www.shell.ca 84
 
0.1%
www.starbucks.ca 83
 
0.1%
Other values (14093) 55516
97.7%

Most occurring characters

ValueCountFrequency (%)
w 178470
15.7%
. 114796
 
10.1%
c 90000
 
7.9%
a 87304
 
7.7%
o 81312
 
7.2%
e 65391
 
5.8%
m 55954
 
4.9%
s 50675
 
4.5%
i 50384
 
4.4%
r 49832
 
4.4%
Other values (70) 309359
27.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 989738
87.3%
Other Punctuation 116174
 
10.2%
Space Separator 21324
 
1.9%
Dash Punctuation 2684
 
0.2%
Decimal Number 2467
 
0.2%
Uppercase Letter 1007
 
0.1%
Math Symbol 52
 
< 0.1%
Control 10
 
< 0.1%
Connector Punctuation 10
 
< 0.1%
Modifier Symbol 8
 
< 0.1%
Other values (2) 3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
w 178470
18.0%
c 90000
 
9.1%
a 87304
 
8.8%
o 81312
 
8.2%
e 65391
 
6.6%
m 55954
 
5.7%
s 50675
 
5.1%
i 50384
 
5.1%
r 49832
 
5.0%
t 47223
 
4.8%
Other values (17) 233193
23.6%
Uppercase Letter
ValueCountFrequency (%)
C 108
 
10.7%
W 105
 
10.4%
S 71
 
7.1%
M 70
 
7.0%
T 59
 
5.9%
A 57
 
5.7%
L 57
 
5.7%
F 52
 
5.2%
R 51
 
5.1%
P 41
 
4.1%
Other values (16) 336
33.4%
Decimal Number
ValueCountFrequency (%)
1 551
22.3%
2 475
19.3%
0 349
14.1%
4 324
13.1%
3 230
9.3%
6 129
 
5.2%
8 119
 
4.8%
9 119
 
4.8%
5 101
 
4.1%
7 70
 
2.8%
Other Punctuation
ValueCountFrequency (%)
. 114796
98.8%
/ 1297
 
1.1%
@ 47
 
< 0.1%
& 18
 
< 0.1%
\ 6
 
< 0.1%
, 4
 
< 0.1%
: 3
 
< 0.1%
' 2
 
< 0.1%
· 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
21324
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2684
100.0%
Math Symbol
ValueCountFrequency (%)
~ 52
100.0%
Control
ValueCountFrequency (%)
10
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 10
100.0%
Modifier Symbol
ValueCountFrequency (%)
` 8
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 990745
87.4%
Common 142732
 
12.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
w 178470
18.0%
c 90000
 
9.1%
a 87304
 
8.8%
o 81312
 
8.2%
e 65391
 
6.6%
m 55954
 
5.6%
s 50675
 
5.1%
i 50384
 
5.1%
r 49832
 
5.0%
t 47223
 
4.8%
Other values (43) 234200
23.6%
Common
ValueCountFrequency (%)
. 114796
80.4%
21324
 
14.9%
- 2684
 
1.9%
/ 1297
 
0.9%
1 551
 
0.4%
2 475
 
0.3%
0 349
 
0.2%
4 324
 
0.2%
3 230
 
0.2%
6 129
 
0.1%
Other values (17) 573
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1133473
> 99.9%
None 4
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
w 178470
15.7%
. 114796
 
10.1%
c 90000
 
7.9%
a 87304
 
7.7%
o 81312
 
7.2%
e 65391
 
5.8%
m 55954
 
4.9%
s 50675
 
4.5%
i 50384
 
4.4%
r 49832
 
4.4%
Other values (68) 309355
27.3%
None
ValueCountFrequency (%)
é 3
75.0%
· 1
 
25.0%

EmplRange
Categorical

Distinct9
Distinct (%)< 0.1%
Missing1
Missing (%)< 0.1%
Memory size609.8 KiB
1 to 4
37311 
5 to 9
16050 
10 to 19
10510 
20 to 49
8120 
50 to 99
 
3313
Other values (4)
 
2727

Length

Max length10
Median length6
Mean length6.6960567
Min length5

Characters and Unicode

Total characters522500
Distinct characters11
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row10 to 19
2nd row20 to 49
3rd row50 to 99
4th row1 to 4
5th row5 to 9

Common Values

ValueCountFrequency (%)
1 to 4 37311
47.8%
5 to 9 16050
20.6%
10 to 19 10510
 
13.5%
20 to 49 8120
 
10.4%
50 to 99 3313
 
4.2%
100 to 299 2149
 
2.8%
300 to 499 318
 
0.4%
500 to 999 164
 
0.2%
1000+ 96
 
0.1%
(Missing) 1
 
< 0.1%

Length

2023-03-04T22:46:49.780445image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-03-04T22:46:50.362984image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
to 77935
33.3%
1 37311
16.0%
4 37311
16.0%
5 16050
 
6.9%
9 16050
 
6.9%
10 10510
 
4.5%
19 10510
 
4.5%
20 8120
 
3.5%
49 8120
 
3.5%
99 3313
 
1.4%
Other values (8) 8671
 
3.7%

Most occurring characters

ValueCountFrequency (%)
155870
29.8%
t 77935
14.9%
o 77935
14.9%
1 60576
 
11.6%
9 46732
 
8.9%
4 45749
 
8.8%
0 27493
 
5.3%
5 19527
 
3.7%
2 10269
 
2.0%
3 318
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 210664
40.3%
Space Separator 155870
29.8%
Lowercase Letter 155870
29.8%
Math Symbol 96
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 60576
28.8%
9 46732
22.2%
4 45749
21.7%
0 27493
13.1%
5 19527
 
9.3%
2 10269
 
4.9%
3 318
 
0.2%
Lowercase Letter
ValueCountFrequency (%)
t 77935
50.0%
o 77935
50.0%
Space Separator
ValueCountFrequency (%)
155870
100.0%
Math Symbol
ValueCountFrequency (%)
+ 96
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 366630
70.2%
Latin 155870
29.8%

Most frequent character per script

Common
ValueCountFrequency (%)
155870
42.5%
1 60576
 
16.5%
9 46732
 
12.7%
4 45749
 
12.5%
0 27493
 
7.5%
5 19527
 
5.3%
2 10269
 
2.8%
3 318
 
0.1%
+ 96
 
< 0.1%
Latin
ValueCountFrequency (%)
t 77935
50.0%
o 77935
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 522500
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
155870
29.8%
t 77935
14.9%
o 77935
14.9%
1 60576
 
11.6%
9 46732
 
8.9%
4 45749
 
8.8%
0 27493
 
5.3%
5 19527
 
3.7%
2 10269
 
2.0%
3 318
 
0.1%

EmplUpdate
Categorical

HIGH CARDINALITY
MISSING

Distinct433
Distinct (%)0.7%
Missing15002
Missing (%)19.2%
Memory size609.8 KiB
2017/11/08 00:00:00+00
11037 
2018/12/30 00:00:00+00
9918 
2017/11/09 00:00:00+00
8042 
2015/10/31 00:00:00+00
4560 
2016/10/31 00:00:00+00
4499 
Other values (428)
24974 

Length

Max length22
Median length22
Mean length22
Min length22

Characters and Unicode

Total characters1386660
Distinct characters14
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique111 ?
Unique (%)0.2%

Sample

1st row2015/10/31 00:00:00+00
2nd row2016/10/31 00:00:00+00
3rd row2015/10/31 00:00:00+00
4th row2015/10/31 00:00:00+00
5th row2015/10/31 00:00:00+00

Common Values

ValueCountFrequency (%)
2017/11/08 00:00:00+00 11037
14.1%
2018/12/30 00:00:00+00 9918
12.7%
2017/11/09 00:00:00+00 8042
10.3%
2015/10/31 00:00:00+00 4560
 
5.8%
2016/10/31 00:00:00+00 4499
 
5.8%
2019/12/12 00:00:00+00 3326
 
4.3%
2019/09/19 00:00:00+00 2718
 
3.5%
2018/09/30 00:00:00+00 849
 
1.1%
2017/06/08 00:00:00+00 726
 
0.9%
2017/05/24 00:00:00+00 646
 
0.8%
Other values (423) 16709
21.4%
(Missing) 15002
19.2%

Length

2023-03-04T22:46:50.621383image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
00:00:00+00 63030
50.0%
2017/11/08 11037
 
8.8%
2018/12/30 9918
 
7.9%
2017/11/09 8042
 
6.4%
2015/10/31 4560
 
3.6%
2016/10/31 4499
 
3.6%
2019/12/12 3326
 
2.6%
2019/09/19 2718
 
2.2%
2018/09/30 849
 
0.7%
2017/06/08 726
 
0.6%
Other values (424) 17355
 
13.8%

Most occurring characters

ValueCountFrequency (%)
0 635253
45.8%
1 147277
 
10.6%
/ 126060
 
9.1%
: 126060
 
9.1%
2 85561
 
6.2%
63030
 
4.5%
+ 63030
 
4.5%
7 33726
 
2.4%
8 28228
 
2.0%
9 23888
 
1.7%
Other values (4) 54547
 
3.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1008480
72.7%
Other Punctuation 252120
 
18.2%
Space Separator 63030
 
4.5%
Math Symbol 63030
 
4.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 635253
63.0%
1 147277
 
14.6%
2 85561
 
8.5%
7 33726
 
3.3%
8 28228
 
2.8%
9 23888
 
2.4%
3 22832
 
2.3%
5 16305
 
1.6%
6 13078
 
1.3%
4 2332
 
0.2%
Other Punctuation
ValueCountFrequency (%)
/ 126060
50.0%
: 126060
50.0%
Space Separator
ValueCountFrequency (%)
63030
100.0%
Math Symbol
ValueCountFrequency (%)
+ 63030
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1386660
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 635253
45.8%
1 147277
 
10.6%
/ 126060
 
9.1%
: 126060
 
9.1%
2 85561
 
6.2%
63030
 
4.5%
+ 63030
 
4.5%
7 33726
 
2.4%
8 28228
 
2.0%
9 23888
 
1.7%
Other values (4) 54547
 
3.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1386660
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 635253
45.8%
1 147277
 
10.6%
/ 126060
 
9.1%
: 126060
 
9.1%
2 85561
 
6.2%
63030
 
4.5%
+ 63030
 
4.5%
7 33726
 
2.4%
8 28228
 
2.0%
9 23888
 
1.7%
Other values (4) 54547
 
3.9%

Sector_Des
Categorical

HIGH CORRELATION
MISSING

Distinct29
Distinct (%)0.2%
Missing63430
Missing (%)81.3%
Memory size609.8 KiB
12383 
Financial Services
 
870
Food and Beverage
 
444
Automotive
 
329
Life Sciences
 
263
Other values (24)
 
313

Length

Max length57
Median length1
Mean length3.3009177
Min length1

Characters and Unicode

Total characters48200
Distinct characters26
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)< 0.1%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
12383
 
15.9%
Financial Services 870
 
1.1%
Food and Beverage 444
 
0.6%
Automotive 329
 
0.4%
Life Sciences 263
 
0.3%
Aerospace 132
 
0.2%
Automotive,Aerospace 55
 
0.1%
Cleantech 24
 
< 0.1%
Automotive,Food and Beverage 24
 
< 0.1%
Automotive,Aerospace,Food and Beverage 15
 
< 0.1%
Other values (19) 63
 
0.1%
(Missing) 63430
81.3%

Length

2023-03-04T22:46:50.902073image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
services 884
19.8%
financial 870
19.5%
and 528
11.9%
beverage 514
11.5%
food 452
10.1%
automotive 329
 
7.4%
life 281
 
6.3%
sciences 265
 
5.9%
aerospace 132
 
3.0%
automotive,aerospace 55
 
1.2%
Other values (15) 145
 
3.3%

Most occurring characters

ValueCountFrequency (%)
14623
30.3%
e 5221
 
10.8%
i 3691
 
7.7%
a 3091
 
6.4%
c 2627
 
5.5%
n 2626
 
5.4%
o 2183
 
4.5%
v 1859
 
3.9%
r 1645
 
3.4%
s 1413
 
2.9%
Other values (16) 9221
19.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 29243
60.7%
Space Separator 14623
30.3%
Uppercase Letter 4130
 
8.6%
Other Punctuation 204
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 5221
17.9%
i 3691
12.6%
a 3091
10.6%
c 2627
9.0%
n 2626
9.0%
o 2183
7.5%
v 1859
 
6.4%
r 1645
 
5.6%
s 1413
 
4.8%
d 1056
 
3.6%
Other values (8) 3831
13.1%
Uppercase Letter
ValueCountFrequency (%)
F 1412
34.2%
S 1180
28.6%
A 680
16.5%
B 528
 
12.8%
L 296
 
7.2%
C 34
 
0.8%
Space Separator
ValueCountFrequency (%)
14623
100.0%
Other Punctuation
ValueCountFrequency (%)
, 204
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 33373
69.2%
Common 14827
30.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 5221
15.6%
i 3691
11.1%
a 3091
9.3%
c 2627
 
7.9%
n 2626
 
7.9%
o 2183
 
6.5%
v 1859
 
5.6%
r 1645
 
4.9%
s 1413
 
4.2%
F 1412
 
4.2%
Other values (14) 7605
22.8%
Common
ValueCountFrequency (%)
14623
98.6%
, 204
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 48200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
14623
30.3%
e 5221
 
10.8%
i 3691
 
7.7%
a 3091
 
6.4%
c 2627
 
5.5%
n 2626
 
5.4%
o 2183
 
4.5%
v 1859
 
3.9%
r 1645
 
3.4%
s 1413
 
2.9%
Other values (16) 9221
19.1%

CENT_X
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct4685
Distinct (%)15.4%
Missing47693
Missing (%)61.1%
Infinite0
Infinite (%)0.0%
Mean608659.35
Minimum596627.93
Maximum616985.06
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size609.8 KiB
2023-03-04T22:46:51.193133image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum596627.93
5-th percentile601465.65
Q1606483.02
median608923.98
Q3611391.08
95-th percentile614814.86
Maximum616985.06
Range20357.121
Interquartile range (IQR)4908.0572

Descriptive statistics

Standard deviation3852.0245
Coefficient of variation (CV)0.0063287033
Kurtosis-0.066028416
Mean608659.35
Median Absolute Deviation (MAD)2462.861
Skewness-0.41317914
Sum1.8466116 × 1010
Variance14838093
MonotonicityNot monotonic
2023-03-04T22:46:51.518403image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
609556.5032 367
 
0.5%
612552.1674 255
 
0.3%
604009.418 228
 
0.3%
609657.7584 205
 
0.3%
615480.8966 178
 
0.2%
604848.575 110
 
0.1%
608539.0792 107
 
0.1%
612581.1624 106
 
0.1%
608826.735 100
 
0.1%
600161.54 100
 
0.1%
Other values (4675) 28583
36.6%
(Missing) 47693
61.1%
ValueCountFrequency (%)
596627.9342 2
 
< 0.1%
596752.9696 2
 
< 0.1%
597309.0542 3
 
< 0.1%
597312.632 2
 
< 0.1%
597772.3526 49
0.1%
597782.4012 2
 
< 0.1%
597812.404 2
 
< 0.1%
597933.2448 13
 
< 0.1%
597963.9396 25
< 0.1%
598104.1884 24
< 0.1%
ValueCountFrequency (%)
616985.0552 9
< 0.1%
616917.8604 1
 
< 0.1%
616879.86 1
 
< 0.1%
616836.9092 2
 
< 0.1%
616794.193 2
 
< 0.1%
616756.05 2
 
< 0.1%
616706.7026 2
 
< 0.1%
616695.363 4
< 0.1%
616668.1574 2
 
< 0.1%
616652.9546 1
 
< 0.1%

CENT_Y
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct7965
Distinct (%)26.3%
Missing47693
Missing (%)61.1%
Infinite0
Infinite (%)0.0%
Mean4829613.5
Minimum4815546.6
Maximum4843107.8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size609.8 KiB
2023-03-04T22:46:52.268595image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum4815546.6
5-th percentile4819703.7
Q14825956.9
median4829277.7
Q34833786.4
95-th percentile4839313.8
Maximum4843107.8
Range27561.199
Interquartile range (IQR)7829.5471

Descriptive statistics

Standard deviation5660.9074
Coefficient of variation (CV)0.0011721243
Kurtosis-0.58959863
Mean4829613.5
Median Absolute Deviation (MAD)3923.2538
Skewness-0.0065033252
Sum1.4652564 × 1011
Variance32045872
MonotonicityNot monotonic
2023-03-04T22:46:52.599637image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4837278.362 255
 
0.3%
4827620.949 185
 
0.2%
4827620.949 182
 
0.2%
4823628.592 115
 
0.1%
4823628.592 113
 
0.1%
4841687.188 107
 
0.1%
4841687.188 98
 
0.1%
4827728.859 91
 
0.1%
4827728.859 87
 
0.1%
4822083.931 86
 
0.1%
Other values (7955) 29020
37.2%
(Missing) 47693
61.1%
ValueCountFrequency (%)
4815546.641 1
< 0.1%
4815609.051 1
< 0.1%
4815609.051 1
< 0.1%
4816109.607 2
< 0.1%
4816333.508 2
< 0.1%
4816381.801 2
< 0.1%
4816381.801 2
< 0.1%
4816389.354 1
< 0.1%
4816389.354 1
< 0.1%
4816462.515 1
< 0.1%
ValueCountFrequency (%)
4843107.84 9
< 0.1%
4843107.84 10
< 0.1%
4843040.829 1
 
< 0.1%
4843040.829 1
 
< 0.1%
4842998.68 1
 
< 0.1%
4842998.68 1
 
< 0.1%
4842855.077 1
 
< 0.1%
4842855.077 1
 
< 0.1%
4842717.945 1
 
< 0.1%
4842717.945 1
 
< 0.1%

Year
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size609.8 KiB
2019
16518 
2018
16350 
2017
15737 
2021
14825 
2016
14602 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters312128
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016
2nd row2016
3rd row2016
4th row2016
5th row2016

Common Values

ValueCountFrequency (%)
2019 16518
21.2%
2018 16350
21.0%
2017 15737
20.2%
2021 14825
19.0%
2016 14602
18.7%

Length

2023-03-04T22:46:53.059771image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-03-04T22:46:53.536785image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
2019 16518
21.2%
2018 16350
21.0%
2017 15737
20.2%
2021 14825
19.0%
2016 14602
18.7%

Most occurring characters

ValueCountFrequency (%)
2 92857
29.7%
0 78032
25.0%
1 78032
25.0%
9 16518
 
5.3%
8 16350
 
5.2%
7 15737
 
5.0%
6 14602
 
4.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 312128
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 92857
29.7%
0 78032
25.0%
1 78032
25.0%
9 16518
 
5.3%
8 16350
 
5.2%
7 15737
 
5.0%
6 14602
 
4.7%

Most occurring scripts

ValueCountFrequency (%)
Common 312128
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 92857
29.7%
0 78032
25.0%
1 78032
25.0%
9 16518
 
5.3%
8 16350
 
5.2%
7 15737
 
5.0%
6 14602
 
4.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 312128
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 92857
29.7%
0 78032
25.0%
1 78032
25.0%
9 16518
 
5.3%
8 16350
 
5.2%
7 15737
 
5.0%
6 14602
 
4.7%

PIN
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct4961
Distinct (%)10.4%
Missing30339
Missing (%)38.9%
Infinite0
Infinite (%)0.0%
Mean11122872
Minimum32500
Maximum32656400
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size609.8 KiB
2023-03-04T22:46:53.977021image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum32500
5-th percentile1878100
Q15158600
median10172700
Q314774800
95-th percentile28577700
Maximum32656400
Range32623900
Interquartile range (IQR)9616200

Descriptive statistics

Standard deviation7579367.8
Coefficient of variation (CV)0.68142186
Kurtosis0.64239942
Mean11122872
Median Absolute Deviation (MAD)4630200
Skewness1.0445894
Sum5.3048311 × 1011
Variance5.7446816 × 1013
MonotonicityNot monotonic
2023-03-04T22:46:54.532383image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6068300 586
 
0.8%
31141506 414
 
0.5%
4407700 328
 
0.4%
9663800 287
 
0.4%
12876900 216
 
0.3%
24265600 190
 
0.2%
14804200 186
 
0.2%
31381800 177
 
0.2%
17704200 161
 
0.2%
10173700 147
 
0.2%
Other values (4951) 45001
57.7%
(Missing) 30339
38.9%
ValueCountFrequency (%)
32500 3
 
< 0.1%
37200 10
 
< 0.1%
37300 2
 
< 0.1%
37400 33
< 0.1%
38100 2
 
< 0.1%
38300 9
 
< 0.1%
38400 14
< 0.1%
38500 2
 
< 0.1%
38600 13
 
< 0.1%
38700 1
 
< 0.1%
ValueCountFrequency (%)
32656400 1
 
< 0.1%
32646400 44
0.1%
32551400 1
 
< 0.1%
32526400 2
 
< 0.1%
32476400 11
 
< 0.1%
32442000 5
 
< 0.1%
32441600 2
 
< 0.1%
32436400 25
< 0.1%
32431500 43
0.1%
32371800 1
 
< 0.1%

Character
Categorical

HIGH CARDINALITY
HIGH CORRELATION
MISSING

Distinct56
Distinct (%)0.3%
Missing61682
Missing (%)79.0%
Memory size609.8 KiB
Northeast EA (West)
4700 
Dixie EA
1048 
Gateway EA (East)
1034 
Meadowvale Business Park CC
998 
Western Business Park EA
847 
Other values (51)
7723 

Length

Max length27
Median length23
Mean length16.546361
Min length7

Characters and Unicode

Total characters270533
Distinct characters43
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowCooksville NHD (East)
2nd rowRathwood NHD
3rd rowCooksville NHD (East)
4th rowRathwood-Applewood CN
5th rowCooksville NHD (East)

Common Values

ValueCountFrequency (%)
Northeast EA (West) 4700
 
6.0%
Dixie EA 1048
 
1.3%
Gateway EA (East) 1034
 
1.3%
Meadowvale Business Park CC 998
 
1.3%
Western Business Park EA 847
 
1.1%
DT Core 738
 
0.9%
Airport CC 507
 
0.6%
Northeast EA (East) 411
 
0.5%
DT Cooksville 409
 
0.5%
Mavis-Erindale EA 392
 
0.5%
Other values (46) 5266
 
6.7%
(Missing) 61682
79.0%

Length

2023-03-04T22:46:55.097579image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ea 8946
19.7%
northeast 5111
 
11.3%
west 5028
 
11.1%
nhd 2823
 
6.2%
park 2036
 
4.5%
east 1943
 
4.3%
business 1845
 
4.1%
cc 1768
 
3.9%
gateway 1473
 
3.2%
dt 1329
 
2.9%
Other values (45) 13071
28.8%

Most occurring characters

ValueCountFrequency (%)
29023
 
10.7%
e 24540
 
9.1%
t 23397
 
8.6%
s 21055
 
7.8%
a 17998
 
6.7%
r 14002
 
5.2%
o 12439
 
4.6%
E 11848
 
4.4%
A 10106
 
3.7%
i 9697
 
3.6%
Other values (33) 96428
35.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 162448
60.0%
Uppercase Letter 65047
24.0%
Space Separator 29023
 
10.7%
Open Punctuation 6677
 
2.5%
Close Punctuation 6677
 
2.5%
Dash Punctuation 661
 
0.2%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 24540
15.1%
t 23397
14.4%
s 21055
13.0%
a 17998
11.1%
r 14002
8.6%
o 12439
7.7%
i 9697
 
6.0%
l 6578
 
4.0%
h 5996
 
3.7%
n 5527
 
3.4%
Other values (11) 21219
13.1%
Uppercase Letter
ValueCountFrequency (%)
E 11848
18.2%
A 10106
15.5%
N 9262
14.2%
C 7359
11.3%
W 5875
9.0%
D 5200
8.0%
H 3127
 
4.8%
M 2865
 
4.4%
P 2537
 
3.9%
B 1845
 
2.8%
Other values (8) 5023
7.7%
Space Separator
ValueCountFrequency (%)
29023
100.0%
Open Punctuation
ValueCountFrequency (%)
( 6677
100.0%
Close Punctuation
ValueCountFrequency (%)
) 6677
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 661
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 227495
84.1%
Common 43038
 
15.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 24540
 
10.8%
t 23397
 
10.3%
s 21055
 
9.3%
a 17998
 
7.9%
r 14002
 
6.2%
o 12439
 
5.5%
E 11848
 
5.2%
A 10106
 
4.4%
i 9697
 
4.3%
N 9262
 
4.1%
Other values (29) 73151
32.2%
Common
ValueCountFrequency (%)
29023
67.4%
( 6677
 
15.5%
) 6677
 
15.5%
- 661
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 270533
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
29023
 
10.7%
e 24540
 
9.1%
t 23397
 
8.6%
s 21055
 
7.8%
a 17998
 
6.7%
r 14002
 
5.2%
o 12439
 
4.6%
E 11848
 
4.4%
A 10106
 
3.7%
i 9697
 
3.6%
Other values (33) 96428
35.6%

CHArea
Categorical

HIGH CARDINALITY
HIGH CORRELATION
MISSING

Distinct57
Distinct (%)0.2%
Missing46689
Missing (%)59.8%
Memory size609.8 KiB
Northeast EA (West)
8989 
Gateway EA (East)
1975 
Dixie EA
1955 
Meadowvale Business Park CC
1898 
Western Business Park EA
1636 
Other values (52)
14890 

Length

Max length27
Median length23
Mean length16.534633
Min length7

Characters and Unicode

Total characters518245
Distinct characters44
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNortheast EA (West)
2nd rowDT Core
3rd rowNortheast EA (West)
4th rowDT Core
5th rowDT Core

Common Values

ValueCountFrequency (%)
Northeast EA (West) 8989
 
11.5%
Gateway EA (East) 1975
 
2.5%
Dixie EA 1955
 
2.5%
Meadowvale Business Park CC 1898
 
2.4%
Western Business Park EA 1636
 
2.1%
DT Core 1477
 
1.9%
Airport CC 996
 
1.3%
Northeast EA (East) 804
 
1.0%
Mavis-Erindale EA 784
 
1.0%
DT Cooksville 724
 
0.9%
Other values (47) 10105
 
12.9%
(Missing) 46689
59.8%

Length

2023-03-04T22:46:55.652993image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
ea 17070
19.6%
northeast 9793
 
11.3%
west 9630
 
11.1%
nhd 5337
 
6.1%
park 3923
 
4.5%
east 3694
 
4.2%
business 3534
 
4.1%
cc 3445
 
4.0%
gateway 2875
 
3.3%
dt 2519
 
2.9%
Other values (48) 25104
28.9%

Most occurring characters

ValueCountFrequency (%)
55581
 
10.7%
e 47046
 
9.1%
t 44934
 
8.7%
s 40159
 
7.7%
a 34746
 
6.7%
r 27014
 
5.2%
o 23860
 
4.6%
E 22566
 
4.4%
A 19328
 
3.7%
i 18277
 
3.5%
Other values (34) 184734
35.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 311221
60.1%
Uppercase Letter 124590
24.0%
Space Separator 55581
 
10.7%
Close Punctuation 12769
 
2.5%
Open Punctuation 12769
 
2.5%
Dash Punctuation 1315
 
0.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 47046
15.1%
t 44934
14.4%
s 40159
12.9%
a 34746
11.2%
r 27014
8.7%
o 23860
7.7%
i 18277
 
5.9%
l 12367
 
4.0%
h 11504
 
3.7%
n 10684
 
3.4%
Other values (12) 40630
13.1%
Uppercase Letter
ValueCountFrequency (%)
E 22566
18.1%
A 19328
15.5%
N 17803
14.3%
C 14198
11.4%
W 11322
9.1%
D 9811
7.9%
H 5878
 
4.7%
M 5574
 
4.5%
P 4873
 
3.9%
B 3534
 
2.8%
Other values (8) 9703
7.8%
Space Separator
ValueCountFrequency (%)
55581
100.0%
Close Punctuation
ValueCountFrequency (%)
) 12769
100.0%
Open Punctuation
ValueCountFrequency (%)
( 12769
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1315
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 435811
84.1%
Common 82434
 
15.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 47046
 
10.8%
t 44934
 
10.3%
s 40159
 
9.2%
a 34746
 
8.0%
r 27014
 
6.2%
o 23860
 
5.5%
E 22566
 
5.2%
A 19328
 
4.4%
i 18277
 
4.2%
N 17803
 
4.1%
Other values (30) 140078
32.1%
Common
ValueCountFrequency (%)
55581
67.4%
) 12769
 
15.5%
( 12769
 
15.5%
- 1315
 
1.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 518245
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
55581
 
10.7%
e 47046
 
9.1%
t 44934
 
8.7%
s 40159
 
7.7%
a 34746
 
6.7%
r 27014
 
5.2%
o 23860
 
4.6%
E 22566
 
4.4%
A 19328
 
3.7%
i 18277
 
3.5%
Other values (34) 184734
35.6%

Modified
Categorical

HIGH CARDINALITY
MISSING

Distinct189
Distinct (%)1.3%
Missing63217
Missing (%)81.0%
Memory size609.8 KiB
2018/12/30 00:00:00+00
2771 
2019/12/12 00:00:00+00
1848 
2019/09/19 00:00:00+00
1586 
2017/11/09 00:00:00+00
1111 
2017/11/08 00:00:00+00
968 
Other values (184)
6531 

Length

Max length22
Median length22
Mean length22
Min length22

Characters and Unicode

Total characters325930
Distinct characters14
Distinct categories4 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique50 ?
Unique (%)0.3%

Sample

1st row2021/06/25 00:00:00+00
2nd row2021/06/03 00:00:00+00
3rd row2021/07/15 00:00:00+00
4th row2021/07/15 00:00:00+00
5th row2021/07/15 00:00:00+00

Common Values

ValueCountFrequency (%)
2018/12/30 00:00:00+00 2771
 
3.6%
2019/12/12 00:00:00+00 1848
 
2.4%
2019/09/19 00:00:00+00 1586
 
2.0%
2017/11/09 00:00:00+00 1111
 
1.4%
2017/11/08 00:00:00+00 968
 
1.2%
2021/07/02 00:00:00+00 354
 
0.5%
2019/06/07 00:00:00+00 267
 
0.3%
2021/05/21 00:00:00+00 186
 
0.2%
2018/09/30 00:00:00+00 177
 
0.2%
2021/05/17 00:00:00+00 168
 
0.2%
Other values (179) 5379
 
6.9%
(Missing) 63217
81.0%

Length

2023-03-04T22:46:56.111600image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
00:00:00+00 14815
50.0%
2018/12/30 2771
 
9.4%
2019/12/12 1848
 
6.2%
2019/09/19 1586
 
5.4%
2017/11/09 1111
 
3.7%
2017/11/08 968
 
3.3%
2021/07/02 354
 
1.2%
2019/06/07 267
 
0.9%
2021/05/21 186
 
0.6%
2018/09/30 177
 
0.6%
Other values (180) 5547
 
18.7%

Most occurring characters

ValueCountFrequency (%)
0 148805
45.7%
1 29895
 
9.2%
/ 29630
 
9.1%
: 29630
 
9.1%
2 29181
 
9.0%
14815
 
4.5%
+ 14815
 
4.5%
9 8963
 
2.7%
7 6006
 
1.8%
8 5090
 
1.6%
Other values (4) 9100
 
2.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 237040
72.7%
Other Punctuation 59260
 
18.2%
Space Separator 14815
 
4.5%
Math Symbol 14815
 
4.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 148805
62.8%
1 29895
 
12.6%
2 29181
 
12.3%
9 8963
 
3.8%
7 6006
 
2.5%
8 5090
 
2.1%
3 3797
 
1.6%
6 2508
 
1.1%
5 2286
 
1.0%
4 509
 
0.2%
Other Punctuation
ValueCountFrequency (%)
/ 29630
50.0%
: 29630
50.0%
Space Separator
ValueCountFrequency (%)
14815
100.0%
Math Symbol
ValueCountFrequency (%)
+ 14815
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 325930
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 148805
45.7%
1 29895
 
9.2%
/ 29630
 
9.1%
: 29630
 
9.1%
2 29181
 
9.0%
14815
 
4.5%
+ 14815
 
4.5%
9 8963
 
2.7%
7 6006
 
1.8%
8 5090
 
1.6%
Other values (4) 9100
 
2.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 325930
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 148805
45.7%
1 29895
 
9.2%
/ 29630
 
9.1%
: 29630
 
9.1%
2 29181
 
9.0%
14815
 
4.5%
+ 14815
 
4.5%
9 8963
 
2.7%
7 6006
 
1.8%
8 5090
 
1.6%
Other values (4) 9100
 
2.8%

BIA_NAME
Categorical

HIGH CORRELATION
MISSING

Distinct6
Distinct (%)< 0.1%
Missing63207
Missing (%)81.0%
Memory size609.8 KiB
13414 
CK
 
443
MLT
 
362
PC
 
304
STR
 
215

Length

Max length3
Median length1
Mean length1.1399663
Min length1

Characters and Unicode

Total characters16900
Distinct characters10
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
13414
 
17.2%
CK 443
 
0.6%
MLT 362
 
0.5%
PC 304
 
0.4%
STR 215
 
0.3%
CLV 87
 
0.1%
(Missing) 63207
81.0%

Length

2023-03-04T22:46:56.543391image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-03-04T22:46:57.018749image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
ck 443
31.4%
mlt 362
25.7%
pc 304
21.5%
str 215
15.2%
clv 87
 
6.2%

Most occurring characters

ValueCountFrequency (%)
13414
79.4%
C 834
 
4.9%
T 577
 
3.4%
L 449
 
2.7%
K 443
 
2.6%
M 362
 
2.1%
P 304
 
1.8%
S 215
 
1.3%
R 215
 
1.3%
V 87
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Space Separator 13414
79.4%
Uppercase Letter 3486
 
20.6%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C 834
23.9%
T 577
16.6%
L 449
12.9%
K 443
12.7%
M 362
10.4%
P 304
 
8.7%
S 215
 
6.2%
R 215
 
6.2%
V 87
 
2.5%
Space Separator
ValueCountFrequency (%)
13414
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 13414
79.4%
Latin 3486
 
20.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
C 834
23.9%
T 577
16.6%
L 449
12.9%
K 443
12.7%
M 362
10.4%
P 304
 
8.7%
S 215
 
6.2%
R 215
 
6.2%
V 87
 
2.5%
Common
ValueCountFrequency (%)
13414
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 16900
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
13414
79.4%
C 834
 
4.9%
T 577
 
3.4%
L 449
 
2.7%
K 443
 
2.6%
M 362
 
2.1%
P 304
 
1.8%
S 215
 
1.3%
R 215
 
1.3%
V 87
 
0.5%

BIAFulName
Categorical

HIGH CORRELATION
MISSING

Distinct6
Distinct (%)< 0.1%
Missing63207
Missing (%)81.0%
Memory size609.8 KiB
13414 
Cooksville BIA
 
443
Malton BIA
 
362
Port Credit BIA
 
304
Streetsville BIA
 
215

Length

Max length16
Median length1
Mean length2.177403
Min length1

Characters and Unicode

Total characters32280
Distinct characters20
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
13414
 
17.2%
Cooksville BIA 443
 
0.6%
Malton BIA 362
 
0.5%
Port Credit BIA 304
 
0.4%
Streetsville BIA 215
 
0.3%
Clarkson BIA 87
 
0.1%
(Missing) 63207
81.0%

Length

2023-03-04T22:46:57.390967image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-03-04T22:46:57.832337image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
ValueCountFrequency (%)
bia 1411
45.1%
cooksville 443
 
14.2%
malton 362
 
11.6%
port 304
 
9.7%
credit 304
 
9.7%
streetsville 215
 
6.9%
clarkson 87
 
2.8%

Most occurring characters

ValueCountFrequency (%)
15129
46.9%
l 1765
 
5.5%
o 1639
 
5.1%
A 1411
 
4.4%
B 1411
 
4.4%
I 1411
 
4.4%
t 1400
 
4.3%
e 1392
 
4.3%
i 962
 
3.0%
r 910
 
2.8%
Other values (10) 4850
 
15.0%

Most occurring categories

ValueCountFrequency (%)
Space Separator 15129
46.9%
Lowercase Letter 11203
34.7%
Uppercase Letter 5948
 
18.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
l 1765
15.8%
o 1639
14.6%
t 1400
12.5%
e 1392
12.4%
i 962
8.6%
r 910
8.1%
s 745
6.7%
v 658
 
5.9%
k 530
 
4.7%
a 449
 
4.0%
Other values (2) 753
6.7%
Uppercase Letter
ValueCountFrequency (%)
A 1411
23.7%
B 1411
23.7%
I 1411
23.7%
C 834
14.0%
M 362
 
6.1%
P 304
 
5.1%
S 215
 
3.6%
Space Separator
ValueCountFrequency (%)
15129
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 17151
53.1%
Common 15129
46.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
l 1765
10.3%
o 1639
9.6%
A 1411
 
8.2%
B 1411
 
8.2%
I 1411
 
8.2%
t 1400
 
8.2%
e 1392
 
8.1%
i 962
 
5.6%
r 910
 
5.3%
C 834
 
4.9%
Other values (9) 4016
23.4%
Common
ValueCountFrequency (%)
15129
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 32280
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
15129
46.9%
l 1765
 
5.5%
o 1639
 
5.1%
A 1411
 
4.4%
B 1411
 
4.4%
I 1411
 
4.4%
t 1400
 
4.3%
e 1392
 
4.3%
i 962
 
3.0%
r 910
 
2.8%
Other values (10) 4850
 
15.0%

RecordID
Real number (ℝ)

Distinct21240
Distinct (%)27.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean34656.267
Minimum2
Maximum94424
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size609.8 KiB
2023-03-04T22:46:58.248376image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile2230
Q19764
median19182.5
Q355026
95-th percentile88915
Maximum94424
Range94422
Interquartile range (IQR)45262

Descriptive statistics

Standard deviation29857.312
Coefficient of variation (CV)0.86152708
Kurtosis-0.99364033
Mean34656.267
Median Absolute Deviation (MAD)16019.5
Skewness0.65057392
Sum2.7042978 × 109
Variance8.9145909 × 108
MonotonicityNot monotonic
2023-03-04T22:46:58.649096image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1055 5
 
< 0.1%
20882 5
 
< 0.1%
19580 5
 
< 0.1%
20871 5
 
< 0.1%
19831 5
 
< 0.1%
19332 5
 
< 0.1%
19583 5
 
< 0.1%
19832 5
 
< 0.1%
19584 5
 
< 0.1%
20872 5
 
< 0.1%
Other values (21230) 77982
99.9%
ValueCountFrequency (%)
2 2
 
< 0.1%
7 5
< 0.1%
10 5
< 0.1%
12 3
< 0.1%
16 5
< 0.1%
18 5
< 0.1%
20 5
< 0.1%
21 5
< 0.1%
23 5
< 0.1%
26 4
< 0.1%
ValueCountFrequency (%)
94424 1
< 0.1%
94423 1
< 0.1%
94419 1
< 0.1%
94371 1
< 0.1%
94321 1
< 0.1%
94319 1
< 0.1%
94318 1
< 0.1%
94317 1
< 0.1%
94313 1
< 0.1%
94293 1
< 0.1%

isnew
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size76.3 KiB
False
56546 
True
21486 
ValueCountFrequency (%)
False 56546
72.5%
True 21486
 
27.5%
2023-03-04T22:46:58.946593image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Closed
Boolean

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size76.3 KiB
False
71617 
True
 
6415
ValueCountFrequency (%)
False 71617
91.8%
True 6415
 
8.2%
2023-03-04T22:46:59.164616image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Interactions

2023-03-04T22:46:31.186928image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:45:57.959426image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:01.136955image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:04.191653image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:08.548933image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:12.120635image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:15.022310image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:17.709869image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:20.730408image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:25.308551image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:28.471291image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:31.426015image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:45:58.238192image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:01.407406image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:04.629074image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:08.896648image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:12.405434image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:15.257695image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:17.972408image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:21.108334image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:25.733541image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:28.745788image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:31.674359image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:45:58.495011image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:01.660090image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:05.036406image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:09.263886image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:12.706242image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:15.502733image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:18.244720image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:21.799453image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:26.151275image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:29.010751image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:31.914878image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:45:58.761605image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:01.926571image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:05.421873image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:09.635201image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:12.962122image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:15.756764image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:18.504112image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:22.207664image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:26.509737image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:29.284862image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:32.159213image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:45:59.024767image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:02.181823image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:05.837468image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:10.003930image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:13.221818image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:15.985392image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:18.774804image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:22.567780image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:26.744289image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:29.541783image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:32.422353image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:45:59.310642image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:02.457197image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:06.199500image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:10.262085image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:13.459496image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:16.217757image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:19.014346image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:22.969495image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:26.974212image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:29.818336image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:32.679623image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:45:59.547145image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:02.689563image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:06.581158image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:10.530401image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:13.723302image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:16.464799image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:19.252758image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:23.342611image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:27.238500image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:30.070660image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:32.919091image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:45:59.795030image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:02.906888image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:06.961837image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:11.073601image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:13.972084image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:16.704204image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:19.479758image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:23.694166image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:27.478771image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:30.313101image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:33.189860image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:00.069925image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:03.167659image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:07.400471image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:11.337003image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:14.261829image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:16.950573image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:19.721767image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:24.133631image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:27.775873image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:30.496446image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:33.460004image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:00.593608image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:03.406056image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:07.803628image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:11.612940image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:14.510921image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:17.196342image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:19.981060image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:24.515868image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:28.020057image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:30.690590image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:33.730447image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:00.867745image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:03.774628image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:08.214161image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:11.877297image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:14.798832image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:17.473026image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:20.365247image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:24.877585image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:28.236684image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
2023-03-04T22:46:30.942823image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Correlations

2023-03-04T22:46:59.370159image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2023-03-04T22:46:59.728295image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2023-03-04T22:47:00.080380image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2023-03-04T22:47:00.450062image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2023-03-04T22:47:00.806630image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2023-03-04T22:46:34.308715image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
A simple visualization of nullity by column.
2023-03-04T22:46:35.720077image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-03-04T22:46:37.721462image/svg+xmlMatplotlib v3.5.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

XYFIDBusinessIDNameAddressStreetNoStreetNameBldgNoUnitNoPostalCodeLocationWardNAICSCodeNAICSCatNAICSDescrPhoneFaxTollFreeEMailWebAddressEmplRangeEmplUpdateSector_DesCENT_XCENT_YYearPINCharacterCHAreaModifiedBIA_NAMEBIAFulNameRecordIDisnewClosed
0-79.68982943.64418111055Golf Trends Inc.300 Ambassador Dr300Ambassador DrL5T 2J3Gateway EA (East)5414470WholesaleAmusement and Sporting Goods Wholesaler-Distributors905-795-8900905-795-89881-800-668-1101lfinch@golftrendsinc.comwww.golftrendsinc.com10 to 192015/10/31 00:00:00+00605668.25384.833187e+062016NaNNaNNaNNaNNaNNaN1055TrueNo
1-79.68941943.64498821057Apex Graphics Inc.320 Ambassador Dr320Ambassador DrL5T 2J3Gateway EA (East)5323120ManufacturingSupport Activities for Printing905-795-9575905-795-8775prepress@apexgraphics.comwww.apexgraphics.com20 to 492016/10/31 00:00:00+00605699.93704.833277e+062016NaNNaNNaNNaNNaNNaN1057TrueNo
2-79.68941943.64498831058Sands, John & Associates Limited320 Ambassador Dr320Ambassador DrL5T 2J3Gateway EA (East)5323120ManufacturingSupport Activities for Printing905-795-9519905-795-877550 to 992015/10/31 00:00:00+00605699.93704.833277e+062016NaNNaNNaNNaNNaNNaN1058TrueNo
3-79.68941943.64498841060Printmedia-Tackaberry Times320 Ambassador Dr320Ambassador DrL5T 2J3Gateway EA (East)5323119ManufacturingOther Printing905-564-8121905-564-7395info@printmedia.cawww.printmedia.ca1 to 42015/10/31 00:00:00+00605699.93704.833277e+062016NaNNaNNaNNaNNaNNaN1060TrueNo
4-79.69066443.64549351061S W R Industries Ltd.321 Ambassador Dr321Ambassador DrL5T 2J3Gateway EA (East)5417230WholesaleIndustrial Machinery, Equipment and Supplies Wholesaler-Distributors905-564-8080905-564-5003shsieh@swrltd.comwww.swrltd.com5 to 92015/10/31 00:00:00+00605598.64424.833332e+062016NaNNaNNaNNaNNaNNaN1061TrueNo
5-79.69027743.64637261063Crossdock Freight Solutions361 Ambassador Dr361Ambassador DrL5T 2J3Gateway EA (East)5488519TransportationOther Freight Transportation Arrangement905-670-4937905-670-9475customerassist@crossdocksystems.comwww.crossdockfreight.com20 to 492015/10/31 00:00:00+00605628.28384.833430e+062016NaNNaNNaNNaNNaNNaN1063TrueNo
6-79.68987743.64691471065Green Belting Industries Ltd.381 Ambassador Dr381Ambassador DrL5T 2J3Gateway EA (East)5325510ManufacturingPaint and Coating Manufacturing905-564-6712905-564-67091-800-668-1114customerservice@greenbelting.comwww.greenbelting.com50 to 992016/10/31 00:00:00+00605659.56464.833490e+062016NaNNaNNaNNaNNaNNaN1065TrueNo
7-79.63427943.64040481073Dafco Filtration Group Corporation5390 Ambler Dr5390Ambler DrBL4W 1G9Northeast EA (West)5333413ManufacturingIndustrial and Commercial Fan and Blower and Air Purification Equipment Manufacturing905-602-1010905-629-1124info@dafcofiltrationgroup.comwww.dafco.ca50 to 992016/10/31 00:00:00+00610155.41824.832840e+062016NaNNaNNaNNaNNaNNaN1073TrueNo
8-79.63284443.64133791074Ace Trans Inc.5391 Ambler Dr5391Ambler Dr1L4W 1H1Northeast EA (West)5493110TransportationGeneral Warehousing and Storage905-625-3000905-625-6049info@acetrans.cawww.acetrans.ca1 to 42016/10/31 00:00:00+00610269.46404.832945e+062016NaNNaNNaNNaNNaNNaN1074TrueNo
9-79.63781543.642638101077Petro Maxx5510 Ambler Dr5510Ambler Dr1 to 2L4W 2V1Northeast EA (West)5541490ProfessionalOther Specialized Design Services905-206-0040blake@petromaxx.cawww.maxxgroupofcompanies.ca20 to 492015/10/31 00:00:00+00609866.14524.833083e+062016NaNNaNNaNNaNNaNNaN1077TrueNo
XYFIDBusinessIDNameAddressStreetNoStreetNameBldgNoUnitNoPostalCodeLocationWardNAICSCodeNAICSCatNAICSDescrPhoneFaxTollFreeEMailWebAddressEmplRangeEmplUpdateSector_DesCENT_XCENT_YYearPINCharacterCHAreaModifiedBIA_NAMEBIAFulNameRecordIDisnewClosed
78022608544.36644.840490e+061481657550Advance Car & Truck Rental2960 Drew Rd2960Drew Rd149L4T 0A5NaN5532111Real EstatePassenger Car Rental905-461-7368905-461-66661-877-303-7368Advancerental@gmail.comwww.advancerental.ca1 to 4NaNNaNNaNNaN202124265600.0NaNNortheast EA (West)2021/06/22 00:00:00+00MLTMalton BIA57550FalseNo
78023608544.36644.840490e+061481757551Video Palace2960 Drew Rd2960Drew Rd150L4T 0A5NaN5532280Real EstateAll Other Consumer Goods Rental905-678-78781 to 4NaNNaNNaNNaN202124265600.0NaNNortheast EA (West)2021/06/02 00:00:00+00MLTMalton BIA57551FalseNo
78024608544.36644.840490e+061481857552Secure Life Insurance Agency Inc.2960 Drew Rd2960Drew Rd151L4T 0A5NaN5524112FinanceDirect Group Life, Health and Medical Insurance Carriers1-800-746-9122www.securelifeinsurance.ca1 to 4NaNNaNNaNNaN202124265600.0NaNNortheast EA (West)2018/12/30 00:00:00+00MLTMalton BIA57552FalseNo
78025608544.36644.840490e+061481957555Skillman Flooring2960 Drew Rd2960Drew Rd155&157BL4T 0A5NaN5442210RetailFloor Covering Stores905-676-9111905-676-9113skillmanflooring@live.cawww.skillmanflooring.com1 to 4NaNNaNNaNNaN202124265600.0NaNNortheast EA (West)2019/12/12 00:00:00+00MLTMalton BIA57555FalseNo
78026608544.36644.840490e+061482057557Verma Vastar Manufacturing Inc.2960 Drew Rd2960Drew Rd160L4T 0A5NaN5315210ManufacturingCut and Sew Clothing Contracting647-669-45451 to 4NaNNaNNaNNaN202124265600.0NaNNortheast EA (West)2018/12/30 00:00:00+00MLTMalton BIA57557FalseNo
78027608544.36644.840490e+061482160142JobsForU2960 Drew Rd2960Drew Rd156L4T 0A5NaN5561310AdministrativeEmployment Placement Agencies and Executive Search Services416-825-4000navjot@jobsforu.cawww.jobsforu.ca10 to 19NaNNaNNaNNaN202124265600.0NaNNortheast EA (West)2021/07/30 00:00:00+00MLTMalton BIA60142TrueNo
78028608544.36644.840490e+061482260159Elite Source Solutions2980 Drew Rd2980Drew Rd133L4T 0A7NaN5561310AdministrativeEmployment Placement Agencies and Executive Search Services905-598-35421 to 4NaNNaNNaNNaN202124265600.0NaNNortheast EA (West)2018/12/30 00:00:00+00MLTMalton BIA60159TrueNo
78029608544.36644.840490e+061482360160Indian Sweet Master2980 Drew Rd2980Drew Rd134L4T 0A7NaN5722511AccommodationFull-service restaurants905-405-85851 to 4NaNNaNNaNNaN202124265600.0NaNNortheast EA (West)2018/12/30 00:00:00+00MLTMalton BIA60160TrueNo
78030608544.36644.840490e+061482460161Mississauga Flooring & Supplies Inc.2980 Drew Rd2980Drew Rd135 & 136L4T 0A7NaN5414320WholesaleFloor Covering Wholesaler-Distributors905-460-70051 to 4NaNNaNNaNNaN202124265600.0NaNNortheast EA (West)2021/08/16 00:00:00+00MLTMalton BIA60161TrueNo
78031608544.36644.840490e+061482560162Punjabi Textile Ltd.2980 Drew Rd2980Drew Rd132L4T 0A7NaN5414110WholesaleClothing and Clothing Accessories Wholesaler-Distributors905-405-1919NaNNaNNaNNaNNaN202124265600.0NaNNortheast EA (West)2018/12/30 00:00:00+00MLTMalton BIA60162TrueNo